key: cord-0966896-kraqses1
authors: Nathan, Anusha; Rossin, Elizabeth J.; Kaseke, Clarety; Park, Ryan J.; Khatri, Ashok; Koundakjian, Dylan; Urbach, Jonathan M.; Singh, Nishant K.; Bashirova, Arman; Tano-Menka, Rhoda; Senjobe, Fernando; Waring, Michael T.; Piechocka-Trocha, Alicja; Garcia-Beltran, Wilfredo F.; Iafrate, A. John; Naranbhai, Vivek; Carrington, Mary; Walker, Bruce D.; Gaiha, Gaurav D.
title: Structure-guided T cell vaccine design for SARS-CoV-2 variants and sarbecoviruses
date: 2021-06-30
journal: Cell
DOI: 10.1016/j.cell.2021.06.029
sha: 1e34c0bfbfd7a085b4e62d98974f3807e60f5dbc
doc_id: 966896
cord_uid: kraqses1

The emergence of SARS-CoV-2 variants that escape convalescent and vaccine-induced antibody responses has renewed focus on the development of broadly protective T cell-based vaccines. Here we apply structure-based network analysis and assessments of HLA class I-peptide stability to define mutationally constrained CD8+ T cell epitopes across the SARS-CoV-2 proteome. Highly networked residues are conserved temporally among circulating variants and across the Sarbecovirus subgenus, and disproportionately impair Spike pseudotyped lentivirus infectivity when mutated. Evaluation of HLA class I stabilizing activity for 18 globally prevalent alleles identifies CD8+ T cell epitopes within highly networked regions with limited mutational frequencies in circulating SARS-CoV-2 variants and deep-sequenced primary isolates. Moreover, these epitopes elicit demonstrable CD8+ T cell reactivity in convalescent individuals but reduced recognition in mRNA-based vaccine recipients. These data thereby elucidate key mutationally constrained regions and immunogenic epitopes in the SARS-CoV-2 proteome for a global T cell-based vaccine against emerging variants and sarbecoviruses.

An effective vaccine for severe acute respiratory syndrome (SARS)-coronavirus 2 (CoV-2) has been a major global health priority. While multiple vaccine candidates have been developed using a variety of platforms to deliver the viral Spike protein and induce neutralizing antibodies (Baden et al., 2020; Folegatti et al., 2020; Keech et al., 2020; Polack et al., 2020; Sadoff et al., 2021) , the emergence of the B.1.1.7 Alpha (Tang et al., 2020) , B.1.351 Beta (Tegally et al., 2021) , P.1 Gamma (Voloch et al., 2020) and B.1.617.2 Delta variants (Cherian et al., 2021) has raised substantial new concerns due to their increased transmissibility (Davies et al., 2021; Kumar et al., 2021) and ability to escape convalescent and vaccine-induced antibody responses Hoffmann et al., 2021; Madhi et al., 2021; Wang et al., 2021; Wibmer et al., 2021) . In addition, given that SARS-CoV-2 is the third coronavirus outbreak in the past twenty years, after SARS-CoV-1 and Middle Eastern respiratory syndrome (MERS)-CoV, significant additional concerns exist about a future pandemic due to the numerous SARS-like coronaviruses identified in the bat reservoir (Menachery et al., 2015 (Menachery et al., , 2016 . Thus, while existing vaccines are critical to curtail the ongoing pandemic, new vaccine candidates that can enhance protection against variants of concern (VOCs) and emerging coronaviruses are also urgently needed.

Neutralizing antibodies will certainly be a key component of a broadly protective vaccine, but the induction of virus-specific CD8 + T cells could greatly augment antibody-based protection.

Individuals with mild COVID-19 disease have increased clonal expansion of CD8 + T cells in bronchoalveloar lavage fluid (Liao et al., 2020) , robust CD8 + T cell reactivity to SARS-CoV-2 epitopes (Peng et al., 2020; Sekine et al., 2020) and rapid CD8 + T cell-mediated viral clearance (Tan et al., 2021) . In addition, patients with X-linked agammaglobulinemia who lack circulating B cells but have functional T lymphocytes developed only mild-to-moderate COVID-19 disease (Soresina et al., 2020) and antibody-mediated depletion of CD8 + T cells from convalescent J o u r n a l P r e -p r o o f macaques partially abrogated protective immunity (McMahan et al., 2020) . From the perspective of a broad sarbecovirus vaccine, CD8 + T cells developed in response to SARS-CoV-1 infection exhibit long-lasting immune memory (Ng et al., 2016) and vaccine-induced CD8 + T cells protect against lethal SARS-CoV-1 challenge in mice (Channappanavar et al., 2014) . Moreover, in contrast to Spike-restricted neutralizing antibody responses, CD8 + T cells can target regions across the SARS-CoV-2 proteome and could therefore be directed at sites that are constrained from mutation in circulating VOCs and sarbecoviruses. This is becoming increasingly important given emerging evidence that SARS-CoV-2 can evade cellular immunity through mutation of HLA class I-restricted epitopes (Agerer et al., 2021) .

Towards this goal, we leveraged an approach known as structure-based network analysis, which uses protein structure data and network theory to delineate topologically important residues based on their local connectivity, involvement in bridging interactions and proximity to known protein ligands (Gaiha et al., 2019) . We recently demonstrated that CD8 + T cell epitopes identified by this approach in the highly mutable human immunodeficiency virus (HIV) were mutationally constrained and preferentially targeted by individuals who successfully control HIV in the absence of therapy (Gaiha et al., 2019; McMichael and Carrington, 2019) . Importantly, structure-based network analysis outperformed traditional sequence conservation metrics in its detection of mutation intolerant residues across a spectrum of HIV and non-HIV proteins. While SARS-CoV-2 has a lower mutation rate than HIV, its rapid worldwide spread has provided ample opportunity for the emergence of viral escape variants, making the structure-based network approach particularly well-suited given that the full extent of sequence variation continues to be defined.

Thus, in this study, we applied structure-based network analysis to high-quality protein structures for SARS-CoV-2 to define mutationally constrained ('highly networked') residues. The J o u r n a l P r e -p r o o f mutational tolerance of highly networked residues was assessed through comparisons to viral sequence entropy data, site-directed mutagenesis and functional assessments of the SARS-CoV-2 Spike protein, and correlations with deep mutational scanning of the receptor binding domain (RBD). Using a recently established HLA class I-peptide stability assay (Kaseke et al., 2021) , we defined CD8 + T cell epitopes within highly networked regions for 18 globally prevalent HLA class I alleles. We then assessed the mutational resistance of these epitopes through analyses of SARS-CoV-2 primary isolates and their immunogenicity in individuals with convalescent disease and mRNA-based vaccine recipients. Collectively, these studies elucidate key regions within the SARS-CoV-2 proteome that are not only structurally constrained from mutation, but which also harbor a globally relevant set of CD8 + T cell epitopes to augment protection against SARS-CoV-2 variants and future SARS-like coronaviruses.

J o u r n a l P r e -p r o o f

To identify mutationally constrained regions in the SARS-CoV-2 proteome, we applied structurebased network analysis (Gaiha et al., 2019) to define areas of topological structural importance.

Based on available high-quality structural data, we were able to calculate amino acid network scores for open and closed Spike protein conformations ( Figure 1A ) and 14 additional proteins, which made up ~44% of the viral proteome (Figures S1A and S1B). Residue network scores were binned (<0, 0-2, 2-4, >4) and compared with sequence entropy values from SARS-CoV-2, sarbecoviruses (SARS-CoV-1/Bat CoV) and MERS-CoV, which revealed a strong inverse relationship between network measures of topological importance and mutational frequencies ( Figures 1B-1D) . Network scores calculated using structural data for SARS-CoV-1 and MERS-CoV were also highly correlated with scores for SARS-CoV-2 (R = 0.78, R = 0.67, respectively), indicating that highly networked residues are likely to be structurally conserved across lineage B and C betacoronaviruses (Figures S2A and S2B) .

Alignment of SARS-CoV-2 residue network scores with sequence entropy values for SARS-CoV-2 (at two distinct time points), sarbecoviruses (CoV-1/Bat CoV) and MERS-CoV also revealed numerous linear regions in which highly networked and conserved CD8 + T cell epitopes could be identified ( Figure 1E ). In addition, given that SARS-CoV-2 network scores were calculated at an early stage of the pandemic (May 2020), we aligned sequence entropy values at that time (45,603 sequences) to values obtained in February 2021 (661,816 sequences) and found that the vast majority of new sequence variation in SARS-CoV-2 has emerged in non-networked regions ( Figure 1E ; yellow boxes). Given the worldwide concern J o u r n a l P r e -p r o o f regarding new SARS-CoV-2 variants, we specifically evaluated residues mutated in the B.1.1.7, B.1.351, P.1 and B.1.617 VOCs and observed that they had low network scores, with ~82.1%

having negative values and ~96.4% having scores <1 (P=0.0003 for comparison of VOC network scores to non-VOC) ( Figure 1E , Table S1 ). This was similar to an analysis of network scores of Spike escape variants identified by in vitro mutational scanning (Greaney et al., 2021) ( Table S1 ). These data demonstrate that structure-based network analysis can predict regions of relative mutational constraint or freedom within SARS-CoV-2 and identify residues highly conserved across sarbecoviruses.

To experimentally evaluate the relationship between SARS-CoV-2 network scores and mutational tolerance, we utilized a SARS-CoV-2 Spike pseudotyped lentivirus assay (Figures S3A-S3C ) and engineered nonconservative point mutations for ten pairs of sequence conserved Spike residues which occupied either high (>2; blue) or low (<1; red) network score positions (Figures 2A-C, Figures S4A, Table S2 ). We also engineered conservative point mutations for highly networked Spike residues to further assess their mutational tolerance (Table S1) . Pseudotyped lentiviruses with no Spike protein (delta Spike), wild-type (WT) Spike protein or mutant Spike proteins were used to infect parental 293T cells or 293T cells expressing human ACE2 (293T-ACE2) and the level of infectivity was determined by ZsGreen expression following 3-day incubation (Figures S3A-S3C ).

Comparative assessment of pseudotyped lentiviruses harboring nonconservative mutations of Spike residues with high or low network scores revealed highly statistically significant differences in 293T-ACE2 cell infectivity (Figures 2D and 2E) . Moreover, conservative mutations of highly networked Spike residues (Table S2 ) also led to substantial impairment of J o u r n a l P r e -p r o o f pseudotyped lentiviral infectivity (Figures 2D and 2E) . Importantly, mutated Spike residues with high or low network scores had no significant difference in sequence entropy for SARS-CoV-2 or sarbecoviruses ( Figure 2C, Figure S4B ), indicating that network score provides a level of resolution of mutational constraint beyond entropy, consistent with previous observations (Gaiha et al., 2019) .

To further assess the mutational tolerance of highly networked residues, we utilized a published high-throughput mutagenesis dataset in which every residue within the SARS-CoV-2 Spike RBD was mutated to all possible amino acid substitutions and assessed for its impact on protein folding stability and ACE2 binding (Starr et al., 2020) . Correlation of full Spike residue network scores with the average effect of residue mutation on RBD folding revealed a significant inverse correlation (R=-0.46, P=9.5x10 -11 ) ( Figure 2F) . Interestingly, there were five residues with high network scores (V362, A363, C391, V524, C525) that did not have substantial impact on RBD protein folding when mutated ( Figure 2F ). We therefore evaluated the protein structure of the monomeric RBD (PDB ID: 6MOJ), which demonstrated that these residues were not within the RBD core or RBD-ACE2 binding surface (Figure 2G ), potentially explaining why they have little effect on either folding or ACE2 binding of the RBD monomer (Starr et al., 2020) . However, evaluation of these residues in the full Spike structure (PDB ID: 6VXX), which we utilized for our network score calculations, reveals that they are located at a critical hinge region between the RBD and distal S1 domain ( Figure 2G ) that mediates the conformational change between the open and closed states of the Spike trimer (Gur et al., 2020; Meirson et al., 2020) .

We therefore engineered conservative and non-conservative mutations for each of these five Spike residues (Table S2 ) and found that they all had significant effects on pseudotyped lentiviral infectivity, with mutations of C391, V524 and C525 having the greatest effect ( Figure   2H ). We also generated network scores for the RBD monomer structure alone and observed a J o u r n a l P r e -p r o o f more robust inverse correlation with the effects of mutation on protein folding stability (R=-0.67, P=7.9x10 -27 ) ( Figure 2I ) and ACE2 binding (R=-0.68, P=6.1x10 -25 ) ( Figure S4C) , indicating better agreement between the two methodologies when the same protein domain is used.

Comparison of sequence entropy values from the SARS-CoV-2 RBD and the average effect of mutation on protein folding revealed a markedly lower magnitude correlation (R=0.37, P=3.4x10 -7 ) ( Figure 2J) , which was also observed for comparisons with sarbecovirus entropy values (R=0.38, P=2.4x10 -7 ) ( Figure S4D ). These data demonstrate that structure-based network analysis outperforms sequence conservation in its ability to identify mutationally constrained residues in the Spike RBD and can also delineate residues that mediate ACE2 binding not detected by deep mutational scanning of the RBD monomer.

To define CD8 + T cell epitopes within highly networked regions of SARS-CoV-2, we utilized a prioritization pipeline that integrates computational epitope prediction with experimental HLA class I stabilization ( Figure 3A) . Epitope network scores were calculated (see Method Details) for all possible 8, 9, 10 and 11 amino acid peptides for which structural data were available (16,604 possible CD8 + T cell epitopes). We subsequently down-selected on those peptides with an epitope network score >3.00 (2,235 epitopes), which was a similar cutoff used for protective epitopes identified in HIV (Gaiha et al., 2019) . We then applied the NetMHCpan 4. (Sette and Sidney, 1999; Sidney et al., 2008) .

We recently demonstrated that HLA class I-peptide stability plays a key role in mediating CD8 + T cell immunodominance hierarchies across the HIV proteome and outperformed predicted binding affinity (Kaseke et al., 2021) . We therefore experimentally determined whether predicted J o u r n a l P r e -p r o o f highly networked SARS-CoV-2 epitopes (311 HLA-epitope pairs, Table S3 ) could bind and stabilize the 18 HLA alleles using an assay that leverages CRISPR/Cas9 edited transporter associated with antigen processing (TAP)-deficient mono-allelic HLA class I-expressing cell lines. Epitopes that achieved at least 50% relative HLA class I stabilization to an HLA-matched immunodominant HIV epitope were considered to be promising SARS-CoV-2 T cell immunogens given the observed in vivo CD8 + T cell targeting of HIV epitopes that reached this threshold (Streeck et al., 2009 ).

As a representative example, we incubated TAP-deficient HLA-A*0301 mono-allelic cells with the immunodominant A*0301-restricted HIV RLRPGGKKK epitope (RK9, Gag p17 20-28) and 15 highly networked SARS-CoV-2 peptides that were predicted to bind by NetMHCPan 4.1 and found five epitopes that successfully surface stabilized HLA-A*0301 at a level >50% of HIV RK9 ( Figure 3B) . Evaluation of all 311 predicted highly networked epitopes for 18 HLA alleles at increasing peptide concentrations (0.1-100M) revealed that 109 epitopes reached >50% relative HLA class I stabilization, of which 56 were derived from SARS-CoV-2 non-structural proteins and 53 were derived from structural proteins and ORF3a ( Figure 3C , Figure S5A , Table S3 ). Representative examples of HLA stabilizing epitopes for HLA-A*0301 include the RK11 epitope from NSP16 (ORF1a 6864-6874) and KR10 epitope from Spike (310-319), both of which occupy topologically important positions in their respective viral proteins ( Figure 3D ).

We also identified peptides that are frequently targeted during natural infection (e.g.

LLYDANYFL, ORF3a 139-147) (Schulien et al., 2021) or can stabilize a number of HLA class I alleles (e.g. MIAQTYSAL, Spike 869-877) ( Figure S5B and S5C, Table S3 ) and induce T cell reactivity in distinct cohorts of recovered individuals (Peng et al., 2020) .

Alignment of highly networked HLA stabilizing epitopes with sequences of SARS-CoV-2 variants, bat CoV RaTG13, SARS-CoV-1, MERS-CoV and the common cold coronaviruses J o u r n a l P r e -p r o o f (HKU1, OC43, 229E, NL63) revealed that 65% of epitopes have1 amino acid mutation, and >90% of epitopes have 2 amino acid mutations across sarbecoviruses (bat CoV, SARS-CoV-1), but substantially higher levels of sequence mismatch for non-lineage B betacoronaviruses ( Figure 3E and 3F) . Specific assessment of the mutations in the SARS-CoV-2 VOCs revealed that only three residues (Spike S982A in B.1.1.7, Nucleocapsid P80R in P.1, Spike H1101D in B.1.617.2) were found to be mutated in the highly networked epitopes from structural and accessory proteins (Table S4) , leading to exact sequence matching or <1 amino acid mutation for 100% of epitopes. We therefore assessed the impact of these mutations on HLA class Ipeptide stability and found no significant difference between ancestral sequence epitopes and five mutated epitopes in B.1.1.7 and P.1 (Figure 3G ), indicating that highly networked CD8 + T cell epitopes would provide broad coverage of VOCs with maintained HLA class I presentation.

To determine whether highly networked epitopes have inherent mutational constraints that would mitigate against the emergence of viral escape variants, we utilized deep sequencing data of 747 primary SARS-CoV-2 isolates that delineated the mutational frequencies of 26 HLA-A*02-restricted epitopes (Agerer et al., 2021) . Importantly, three of these epitopes were identified as being highly networked (ALNTLVKQL, Spike 958-966; KLNDLCFTNV, Spike 386-395; VLNDILSRL, Spike 976-984). Given that each viral isolate was sequenced to a similar depth and the prevalence of HLA-A*02 was ~30% in the affected population, we felt that this was a highly relevant dataset to compare the in vivo viral evolution of highly networked and nonnetworked epitopes. We therefore compared the frequencies of mutations at HLA anchor and TCR contact sites (position 2 through the terminal amino acid) that achieved an allelic frequency of >0.1 (i.e. tolerated mutations nearing fixation) and >0.9 (i.e. achieved mutational fixation) (Agerer et al., 2021) . This revealed a striking difference with 6.67% (2/30) of networked epitope variants having an allelic frequency >0.1 and 0% (0/30) having an allelic frequency >0.9, while J o u r n a l P r e -p r o o f 25.2% (66/262) of non-networked epitope variants had an allelic frequency >0.1 (P = 0.02) and

16.8% (44/262) achieved mutational fixation (>0.9; P = 0.01) ( Figure 3H ). Alternatively, while the networked epitopes represented 10.3% of the analyzed epitope sequences, they accounted for only 2.9% of all variant epitopes with allelic frequencies >0.1 and 0% of variants with allelic frequencies >0.9. These analyses suggest that highly networked epitopes have significant constraints on in vivo viral evolution in comparison to non-networked epitopes restricted by the same HLA allele.

To evaluate the immunogenicity of highly networked HLA stabilizing SARS-CoV-2 epitopes, we assessed the CD8 + T cell responses within a cohort of 20 healthy donors (HDs) and 30 convalescent COVID-19 patients ( Table 1) . CD4-depleted peripheral blood mononuclear cells (PBMCs) were tested for reactivity to peptide pools of highly networked epitopes derived from non-structural proteins (NSP, n=56), structural and accessory proteins (SP, n=53) or a combination of non-structural and structural proteins (NSP + SP, n=109) ( Figure 4A ) using ex vivo interferon-(IFN-) enzyme-linked immunospot (ELISpot) assays ( Figure 4B ). Anti-CD3/CD28 antibodies and a pool of CMV, EBV and Flu (CEF) peptides were used as positive controls, and DMSO was used as a negative control.

While CEF-specific CD8 + T cell responses were not significantly different between the two patient groups (Figure 4C ), we observed significant differences in IFN-+ CD8 + T cell responses to highly networked HLA stabilizing epitopes in the SP peptide pool (n=53 epitopes) (1/20 HDs vs 15/30 COVID-19; P=0.0003) and combined NSP + SP pool (n=109 epitopes) (3/20 HDs vs.

13/30 COVID-19; P=0.001), but not the NSP pool alone (n=56 epitopes) (2/20 HDs vs. 8/30 COVID-19; P=0.2627) ( Figure 4D ). This is consistent with prior reports that observed stronger J o u r n a l P r e -p r o o f SARS-CoV-2-specific CD8 + T cell responses to epitopes derived from higher abundance structural proteins than from non-structural proteins (Grifoni et al., 2020; Le Bert et al., 2020) . In addition, we observed a higher average magnitude of IFN-+ CD8 + T cell response to the SP pool in convalescent COVID-19 patients with moderate-to-severe disease (n=9) than in those with mild disease (n=21) (Figure 4E) , consistent with prior work (Peng et al., 2020) . In patients who responded to the highly networked SP peptide pool, we also observed a decrease in CD8 + T cell reactivity of individual participants when incubated with the combination SP+NSP peptide pool (13/15 individuals) ( Figure 4F ). Importantly, these data demonstrate that highly networked epitopes from structural and accessory proteins that stabilize HLA molecules, which can be encompassed within 15 viral regions ( Table 2) , are immunogenic in natural infection and therefore viable candidates for a broadly protective T cell-based vaccine.

Given that mRNA-based vaccines for SARS-CoV-2 have been widely distributed, we sought to determine whether vaccine recipients have CD8 + T cell responses to highly networked epitopes from the viral Spike protein. We therefore assessed the reactivity of CD8 + T cells from individuals who were at least 14 days post receipt of two doses of either the BNT162b2 (n=13) or mRNA-1273 (n=10) vaccines ( Table S5) . Stimulation of PBMCs with a full overlapping Spike peptide pool yielded robust IFN-+ responses for recipients of both vaccines (18/23 vaccine recipients), as has previously been described (Jackson et al., 2020; Sahin et al., 2020 ) ( Figure   5A ). However, Spike peptide stimulation of PBMCs following depletion of CD4 + T cells led to markedly lower IFN-+ T cell magnitude and reactivity (9/23 responders) (Figures 5A and 5B) , illustrating the preferential induction of Spike-specific CD4 + T cells. Importantly, the magnitude of CEF-specific CD8 + T cell responses was maintained following CD4 + depletion ( Figure 5C ), indicating no significant loss in assay sensitivity for Spike-specific CD8 + T cells. While ~39% of J o u r n a l P r e -p r o o f mRNA-based vaccine recipients had detectable CD8 + T cell responses to the full Spike peptide pool, an even smaller number had reactivity to a highly networked Spike epitope pool (6/23 responders) ( Figure 5D ). Interestingly, three of the six vaccine recipients with responses to highly networked Spike epitopes were individuals with prior SARS-CoV-2 infection (Table S5) , further illustrating the immunogenicity of these epitopes during natural infection, but the modest CD8 + T cell reactivity after mRNA-based vaccination.

To evaluate for the presence of functional memory CD8 + T cell responses to highly networked Spike epitopes, we also performed a 6-day carboxyfluorescein succinimidyl ester (CFSE)-based proliferation assay. Similar to the IFN-ELISpot (Figure 5E ), mRNA-based vaccine recipients had low and frequently undetectable proliferative CD8 + T cell responses to both full Spike antigen (8/23 responders) and highly networked Spike epitopes (4/23 responders) ( Figure 5F ).

This suggests that mRNA-based vaccines induce reduced levels of CD8+ T cell reactivity against mutationally constrained regions of the Spike protein, indicating an opportunity for additional vaccines that can elicit CD8+ T cell responses to highly networked epitopes. The ability of structure-based network analysis to define mutation intolerant residues in SARS-CoV-2 leverages prior work applying the approach to HIV (Gaiha et al., 2019) . We confirmed our network-based predictions using a Spike pseudotyped lentiviral infectivity assay and the strong agreement between our computational network scores and experimentally derived effects of mutation on Spike RBD folding and ACE2 binding (Starr et al., 2020) further illustrated its predictive ability. Network score also outperformed sequence entropy in its identification of functionally important RBD residues. Moreover, as a complement to timeand resource-intensive approaches such as deep mutational scanning, structure-based network J o u r n a l P r e -p r o o f analysis was able to be rapidly deployed, allowing for the prompt prediction of mutationally constrained residues for ~44% of the SARS-CoV-2 proteome.

Sequence analysis of SARS-CoV-2 demonstrated that most of the new sequence variation over the course of the pandemic has occurred in sites with low network scores. This suggests that viral regions identified by network analysis may have the potential to remain constrained from mutation even as the pandemic progresses. Moreover, while deep sequencing of primary SARS-CoV-2 isolates revealed subversion of CD8 + T cell surveillance through HLA class Irestricted epitope mutations (Agerer et al., 2021) , we found that highly networked HLA-A*02 epitopes exhibited significantly lower levels of mutational frequency than non-networked epitopes, indicating putative constraints on SARS-CoV-2 evasion from highly networked CD8 + T cell epitope responses.

The use of HLA class I-peptide stability assays to define CD8 + T cell epitopes within highly networked regions builds on previous work linking epitope immunogenicity and immunodominance hierarchies to HLA class I stabilizing capacity (Harndahl et al., 2012; Rasmussen et al., 2016; Kaseke et al., 2021) . The demonstrable CD8 + T cell reactivity to these epitopes in recovered individuals confirmed their immunogenicity and the agreement between the highly networked epitopes identified in this study and those found by orthogonal methodologies further illustrated the value of the HLA-peptide stability assays (Ferretti et al., 2020; Peng et al., 2020; Schulien et al., 2021) . The higher magnitude responses against highly networked epitopes from higher abundance structural and accessory proteins was also consistent with prior reports examining CD8 + T cell targeting in SARS-CoV-1 (Li et al., 2008) and SARS-CoV-2 (Grifoni et al., 2020; Le Bert et al., 2020) .

In addition to their immunogenicity, highly networked SARS-CoV-2 epitopes exhibited strong sequence homology with circulating variants and sarbecoviruses. The limited sequence homology between highly networked epitopes and common cold CoVs likely explains the absence of detectable CD8 + T cell responses in healthy donors, but also underscores the benefit that a networked T cell vaccine could provide for uninfected individuals. Moreover, the observation that mRNA-based vaccine recipients had modest CD8 + T cell responses to highly networked Spike epitopes suggests that there is also an opportunity to augment vaccineinduced immunity. Given that highly networked regions from structural and accessory proteins (which harbor 53 epitopes for 18 HLA class I alleles) can be encompassed within 315 amino acid residues (Table 2) indicates that the size of a focused immunogen would not preclude its incorporation into any number of vaccine delivery platforms. We therefore envision that this networked T cell immunogen would be delivered alongside a Spike-based vaccine as a tandem molecule or as a separate co-delivered physical entity. This would ensure ample CD4 + T cell help to facilitate the induction of de novo highly networked CD8 + T cell responses, which ideally would be highly proliferative with robust expression of cytotoxic effector molecules.

In summary, we integrated structure-based network analysis and HLA class I-peptide stability assessments to define a global set of mutationally constrained CD8 + T cell epitopes. While variant Spike vaccines may provide immunity for circulating VOCs, the induction of CD8 + T cell responses to highly networked epitopes could offer additional protection against newly emerging variants. A highly networked T cell vaccine would therefore be highly complementary to ongoing antibody-based vaccine efforts to provide the global population with broad immunity against continued SARS-CoV-2 evolution and future SARS-like coronaviruses.

The present study assessed the in vivo mutational frequencies of highly networked and nonnetworked epitopes restricted by HLA-A*02 using available sequence data (Agerer et al., 2021) .

Future studies evaluating epitopes restricted by additional HLA alleles will help further confirm the mutational constraints of highly networked epitopes. In addition, while the number of convalescent and healthy donors evaluated for CD8 + T cell responses to highly networked epitopes was limited, it was similar to previous studies characterizing cellular immunity to SARS-CoV-2 (Grifoni et al., 2020; Le Bert et al., 2020; Peng et al., 2020; Rodda et al., 2021) .

The present study only used IFN-ELISpot assays to evaluate CD8 + T cell reactivity within convalescent individuals, which may not fully detect CD8 + T cell responses or assess the full complement of CD8 + T cell functions. While the magnitude of IFN-responses that we observed in our study was comparable to what has been previously published for other convalescent COVID-19 cohorts (Le Bert et al., 2020; Peng et al., 2020) , additional assays, such as the CD8 + T cell activation induced marker assay (Grifoni et al., 2020) , CFSE-based proliferation assay or intracellular cytokine staining following peptide stimulation could also be utilized. The graphical abstract was prepared using Biorender. 

We worked to ensure sex balance in the selection of non-human subjects. One or more of the authors of this paper self-identifies as an underrepresented ethnic minority in science. One or more of the authors of this paper self-identifies as a member of the LGBTQ+ community. One or more of the authors of this paper received support from a program designed to increase minority representation in science. follows: *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001. See also Figure S1 and Table   S1 . Shannon entropy values of SARS-CoV-2 RBD residues and average effect of mutation on monomeric RBD folding. Correlations were calculated by Spearman's rank correlation coefficient. For comparisons of more than two groups, Kruskal-Wallis test with Dunn's post hoc analyses were used. Calculated P values were as follows: *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001. See also Figure S2 and Table S2 . were made using Fisher's exact test. Calculated P values were as follows: *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001. See also Figure S3 and Table S3 . 

SP and NSP + SP peptide pools in COVID-19 SP peptide pool responders (n=15). Statistical comparison was made using Wilcoxon matched pairs test. All other statistical comparisons were made using Mann-Whitney U test. Calculated P values were as follows: *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001. See also Table 1 . 

Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Gaurav D. Gaiha (ggaiha@mgh.harvard.edu)

All requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact author. All reagents will be made available on request after completion of a Materials Transfer Agreement.

All data supporting the findings of this study available within the paper and available from the corresponding author upon request. The code used during this study to generate network scores is archived at Zenodo (https://doi.org/10.5281/zenodo.2597484). Viral sequence data of highly networked and non-networked epitope variants from primary SARS-CoV-2 isolates is available in the supplementary materials of the primary manuscript (Agerer et al., 2021) . 

The female cell line HEK293T were used for lentivirus production and ACE2-expressing alleles (Shimizu and DeMars, 1989) . These cell lines were maintained in RPMI-1640 medium (Sigma-Aldrich) supplemented with 10% (v/v) FBS (Sigma-Aldrich) and 1X Penicillin-Streptomycin-L-Glutamine mixture (Gibco). TAP-deficient mono-allelic HLA class I-expressing 721.221 cells were generated as described previously (please see companion manuscript) and maintained in 5ug/mL blasticidin (Invivogen), 0.5 ug/ml puromycin (Invivogen) and 1.5 mg/ml G418 (Invivogen).

Peripheral blood mononuclear cells (PBMCs) were isolated from healthy human volunteers, SARS-CoV-2 infected patients and mRNA-based vaccine recipients by Ficoll gradient separation from ACD tubes. They were then cryopreserved and stored in liquid nitrogen prior to experimental use. The study was approved by the MGH Institutional Review Board. All subjects were between 18-70 years of age, provided informed consent and were confirmed to have a test positive for SARS-CoV-2 using PCR with reverse transcription from an upper respiratory tract (nose and throat) swab tested at an accredited laboratory and a subsequent negative PCR test following symptom resolution at least 7 days prior to sample acquisition. For infected individuals, the degree of disease severity was identified as mild, severe or critical infection, according to recommendations from the World Health Organization. Patients were classified as having mild symptoms if they did not require oxygen (that is, their oxygen saturation was 94% or greater on ambient air) and if their symptoms were managed at home. Moderate-to-severe infection was defined as one of the following conditions in a patient confirmed as having COVID-19:

respiratory distress with a respiratory rate of >30 breaths per minute; blood oxygen saturation of J o u r n a l P r e -p r o o f <94%; or arterial oxygen partial pressure/FiO2 < 300 mmHg. All mRNA-based vaccine recipients received two doses of either the Pfizer-BioNTech BNT162b2 or Moderna mRNA-1273 vaccines and were at least 14 days post-administration of the second vaccine dose.

For the analysis of the SARS-CoV-2 proteome, the following PDB files were utilized: NSP3 ADP (https://salilab.org/modeller/) was used to create homology models for the envelope protein using SARS-CoV-1 envelope (PDB: 5X29) as a template. Water molecules and solvents were removed from each PDB file prior to analysis.

For the analysis of the SARS-CoV-1 proteome, the following PDB files were utilized: NSP3 ADP 2GIB). Water molecules and solvents were removed from each PDB file prior to analysis.

For the analysis of the MERS proteome, the following PDB files were utilized: NSP3 ADP ribose 

Individuals network scores were calculated as described previously (Gaiha et al., 2019) . For multimeric proteins, degree-based network values (second-order degree, ligand binding) in the protein's highest oligomeric state were utilized prior to calculation of a normalized Z-score. For node edge betweenness metrics, the maximum normalized Z-score from monomer or multimeric conformation was incorporated into the final network score calculation. For analyses with multiple structures utilized to capture different conformational states for the same oligomeric structure (e.g. 6VXX and 6VYB, closed and open conformations), network Z-scores were averaged. All molecular assemblies were generated using the online server PDBePISA (http://www.ebi.ac.uk/pdbe/pisa/).

The positions of the alpha carbons of protein residues of interest were parsed from the corresponding PDB structure files using a script written in the Perl language version 5.18.2.

Subsequent data processing and visualization were performed in the R language, version 3.6.3.

Network scores for residues were mapped to alpha carbons in the corresponding structures by J o u r n a l P r e -p r o o f position. In cases where the PDB structures used to generate the network scores differed from the PDB structures used for visualization, and where individual residue substitutions had occurred, the alpha carbon position for the substituted residue in the structure PDB file was used. Protein network structures were created using R's igraph package version 1.2.5. Using the alpha carbon positions as X, Y, and Z coordinates, nodes representing residues, were plotted in three dimensions using the R rgl library version 0.100.50, which implements OpenGL.

Where indicated, residue nodes were colored and scaled by their corresponding network score, and inter-residue interaction strengths were represented as the thickness of edges between nodes. Two-dimensional network views of interest as defined by aspect, zoom and rotation, were selected manually, then extracted using the rgl par3d function and saved for later use.

Views of interest were exported as PNG files.

Epitope network scores were calculated as described previously (Gaiha et al., 2019) . Briefly, network scores from individual amino acid residues within and neighboring a CD8 + T cell epitope were combined and averaged based on their involvement as either HLA anchor, TCR contact or peptide processing residues. HLA anchor residues were defined based on previous delineations for each HLA allele (Marsh et al., 1999) . Putative TCR contact residues were considered to be all remaining non-HLA anchor residues, excluding position 1, based on previously reported frequencies of TCR-peptide contacts (Calis et al., 2012) . Flanking residues were defined as the five residues N-terminal and C-terminal to the epitope (ten in total). These three quantities were then summed to generate an overall composite network score for each CD8 + T cell epitope using the following formula:

where NS i is the i th network score for HLA anchor residues 1 through a, NS j is the j th network score for TCR contact residues 1 through b and NS k is the k th network score for peptide processing residues 1 through c. The normalized epitope network score (Table S3) was calculated by subtracting the lowest epitope network score from all epitope scores, such that all values were greater than or equal to zero. The normalized network score was utilized when comparing patient responses such that no CTL response would be assigned a negative value.

For the analysis of the highly stabilizing epitopes across human coronaviruses, the following 

SARS-CoV-2 Spike pseudotyped lentivirus was produced as previously described . 

J o u r n a l P r e -p r o o f HEK293T and ACE2-expressing HEK293T cells were seeded at a density of 1.25×10 4 cells/well into a 96-well plate one day prior to infection with 60 μL wild-type or mutant Spike pseudotyped lentivirus diluted two-fold in D10 with 5 μg/mL Polybrene Transfection Reagent (Millipore). 24h

following infection, an additional 140 μL of D10 was added and cells were cultured at 37°C and 5% CO2 for 48h. Cells were harvested, stained with viability dye, fixed in 2% paraformaldehyde and subsequently analyzed for ZsGreen expression via flow cytometry using a BD LSR II (BD Biosciences). Flow cytometric data were analyzed using FlowJo software (v10.1r5). trifluoroacetic acid (TFA) were purchased from Sigma-Aldrich.

Peptides were synthesized on an automated robotic peptide synthesizer (AAPPTEC, Model 396

Omega) by using Fmoc solid-phase chemistry (Behrendt et al., 2016) on 2-chlorotrityl chloride resin (Chatzi et al., 1991) . The C-terminal amino acids were loaded using the respective Fmoc-Amino Acids in the presence of DIEA. Unreacted sites on the resin were blocked using methanol, DIEA and DCM (15:5:80 v/v). Subsequent amino acids were coupled using optimized (to generate peptides containing more than 90% of the desired full-length peptides) cycles consisting of Fmoc removal (deprotection) with 25% Piperidine in NMP followed by coupling of Fmoc-AAs using HCTU/NMM activation. Each deprotection or coupling was followed by several J o u r n a l P r e -p r o o f washes of the resin with DMF to remove excess reagents. After the peptides were assembled and the final Fmoc group removed, peptide resin was then washed with dimethylformamide, dichloromethane, and methanol three times each and air dried. Peptides were cleaved from the solid support and deprotected using odor free cocktail (TFA/triisopropyl silane/water/DODT; 94/2.5/2.5/1.0 v/v) for 2.5h at room temperature (Teixeira et al., 2002) . Peptides were precipitated using cold methyl tertiary butyl ether (MTBE were subtracted from the positive wells, and the results were expressed as spot-forming units (SFU) per 10 6 PBMCs. Responses were considered positive if the results were >5 SFU/10 6 PBMCs following control subtraction. If negative DMSO control wells had >30 SFU/10 6 PBMCs or if positive control wells (anti-CD3/anti-CD28 stimulation) were negative, the results were excluded from further analysis.

PBMCs were suspended at 1 x 10 6 /mL in PBS and incubated at 37 o C for 20 min with 0.5 uM carboxyfluorescein succinimidyl ester (CFSE; Life Technologies). After the addition of serum and washes with PBS, cells were resuspended at 1 x 10 6 /mL and plated into 96-well U-bottom plates (Corning) at 200 uL volumes. Peptide pools were added at a final concentration of 1 ug/mL. On day 6, cells were harvested, washed with PBS + 2% Fetal Bovine Serum, and stained with anti-CD3-PE-Cy7 (clone SK1; BioLegend), anti-CD8 APC (clone SK7; BioLegend), and LIVE/DEAD violet viability dye (Life Technologies). Cells were washed and fixed in 2% paraformaldehyde, prior to flow cytometric analysis on a BD LSR II (BD Biosciences). A positive response was defined as one with a percentage of CD3 + CD8 + CFSE low cells at least 1.5x greater than the highest of three negative-control wells and greater than 0.2% CD8 + CFSE low cells in magnitude following background subtraction.

The generation of dot plots, nonparametric statistical analysis, correction for multiple comparisons and non-parametric correlations (Spearman) were performed using the statistical programs in Graphpad Prism version 8.0. Differences between groups were evaluated using the Epitope mutations within SARS-CoV-2 variants of concern are bolded.

J o u r n a l P r e -p r o o f

+ CFSE low cells at least 1.5x greater than background wells and greater than 0.2% CD8 + CFSE low cells in magnitude following background subtraction. Statistical comparisons were made using Wilcoxon matched pairs test

SARS-CoV-2 mutations in MHC-I-restricted epitopes evade CD8+ T cell responses

Efficacy and Safety of the mRNA-1273 SARS-CoV-2 Vaccine

Advances in Fmoc solid-phase peptide synthesis

Degenerate T-cell recognition of peptides on MHC molecules creates large holes in the T-cell repertoire

Virus-specific memory CD8 T cells provide substantial protection from lethal severe acute respiratory syndrome coronavirus infection

2-Chlorotrityl chloride resin: Studies on anchoring of Fmoc-amino acids and peptide cleavage

Convergent evolution of SARS-CoV-2 spike mutations, L452R, E484Q and P681R, in the second wave of COVID-19 in Maharashtra

Efficient large-scale production and concentration of HIV-1-based lentiviral vectors for use in vivo

Protocol and Reagents for Pseudotyping Lentiviral Particles with SARS-CoV-2 Spike Protein for Neutralization Assays

Unbiased Screens Show CD8+ T Cells of COVID-19 Patients Recognize Shared Epitopes in SARS-CoV-2 that Largely Reside outside the Spike Protein

Safety and immunogenicity of the ChAdOx1 nCoV-19 vaccine against SARS-CoV-2: a preliminary report of a phase 1/2, singleblind, randomised controlled trial

Structural topology defines protective CD8+ T cell epitopes in the HIV proteome. Science

Multiple SARS-CoV-2 variants escape neutralization by vaccine-induced humoral immunity

Complete mapping of mutations to the SARS-CoV-2 spike receptor-binding domain that escape antibody recognition

Targets of T Cell Responses to SARS-CoV-2 Coronavirus in Humans with COVID-19 Disease and Unexposed Individuals

Conformational transition of SARS-CoV-2 spike glycoprotein between its closed and open states

Peptide-MHC class I stability is a better predictor than peptide affinity of CTL immunogenicity

SARS-CoV-2 variant B.1.617 is resistant to Bamlanivimab and evades antibodies induced by infection and vaccination

An mRNA vaccine against SARS-CoV-2-preliminary report

HLA class I-peptide stability mediates CD8 + T cell immunodominance hierarchies and facilitates HLA-associated immune control of HIV

Phase 1-2 Trial of a SARS-CoV-2 Recombinant Spike Protein Nanoparticle Vaccine

Possible link between higher transmissibility of B.1.617 and B.1.1.7 variants of SARS-CoV-2 and increased structural stability of its spike protein and hACE2 affinity

SARS-CoV-2-specific T cell immunity in cases of COVID-19 and SARS, and uninfected controls

T cell responses to whole SARS coronavirus in humans

Single-cell landscape of bronchoalveolar immune cells in patients with COVID-19

Immunological Bioinformatics

Efficacy of the ChAdOx1 nCoV-19 Covid-19 Vaccine against the B.1.351 Variant

The HLA FactsBook

Correlates of protection against SARS-CoV-2 in rhesus macaques

Topological perspective on HIV escape

Structural basis of SARS-CoV-2 spike protein induced by ACE2

A SARS-like cluster of circulating bat coronaviruses shows potential for human emergence

SARS-like WIV1-CoV poised for human emergence

Memory T cell responses targeting the SARS coronavirus persist up to 11 years post-infection

Broad and strong memory CD4+ and CD8+ T cells induced by SARS-CoV-2 in UK convalescent individuals following COVID-19

Safety and Efficacy of the BNT162b2 mRNA Covid-19 Vaccine

Pan-Specific Prediction of Peptide-MHC Class I Complex Stability, a Correlate of T Cell Immunogenicity

Functional SARS-CoV-2-Specific Immune Memory Persists after Mild COVID-19

Safety and Efficacy of Single-Dose Ad26.COV2.S Vaccine against Covid-19

BNT162b2 induces SARS-CoV-2-neutralising antibodies and T cells in humans. MedRxiv

Characterization of pre-existing and induced SARS-CoV-2-specific CD8+ T cells

Robust T cell immunity in convalescent individuals with asymptomatic or mild COVID-19. Cell

Nine major HLA class I supertypes account for the vast preponderance of HLA-A and -B polymorphism

Production of human cells expressing individual transferred HLA-A,-B,-C genes using an HLA-A,-B,-C null human cell line

HLA class I supertypes: a revised and updated classification

Two X-linked agammaglobulinemia patients develop pneumonia as COVID-19 manifestation but recover

Deep mutational scanning of SARS-CoV-2 receptor binding domain reveals constraints on folding and ACE2 binding

Human immunodeficiency virus type 1-specific CD8+ T-cell responses during primary infection are major determinants of the viral set point and loss of CD4+ T cells

Early induction of functional SARS-CoV-2-specific T cells associates with rapid viral clearance and mild disease in COVID-19 patients

Emergence of a new SARS-CoV-2 variant in the UK

Detection of a SARS-CoV-2 variant of concern in South Africa

The use of DODT as a non-malodorous scavenger in Fmoc-based peptide synthesis

Genomic characterization of a novel SARS-CoV-2 lineage from Rio de Janeiro

mRNA vaccine-elicited antibodies to SARS-CoV-2 and circulating variants

SARS-CoV-2 501Y.V2 escapes neutralization by South African COVID-19 donor plasma

We thank Alejandro Balazs and Evan Lam for assistance with pseudotyped lentivirus infectivity assays, Maia Pavlovic, David Gregory and Mark Poznansky for access to COVID-19 vaccinee specimens, and Shiv Pillai, Vinay Mahajan and David Collins for their scientific advice. Access