key: cord-0721756-tqlzeh09 authors: Weingarten-Gabbay, Shira; Klaeger, Susan; Sarkizova, Siranush; Pearlman, Leah R.; Chen, Da-Yuan; Gallagher, Kathleen M.E.; Bauer, Matthew R.; Taylor, Hannah B.; Dunn, W. Augustine; Tarr, Christina; Sidney, John; Rachimi, Suzanna; Conway, Hasahn L.; Katsis, Katelin; Wang, Yuntong; Leistritz-Edwards, Del; Durkin, Melissa R.; Tomkins-Tinch, Christopher H.; Finkel, Yaara; Nachshon, Aharon; Gentili, Matteo; Rivera, Keith D.; Carulli, Isabel P.; Chea, Vipheaviny A.; Chandrashekar, Abishek; Bozkus, Cansu Cimen; Carrington, Mary; Bhardwaj, Nina; Barouch, Dan H.; Sette, Alessandro; Maus, Marcela V.; Rice, Charles M.; Clauser, Karl R.; Keskin, Derin B.; Pregibon, Daniel C.; Hacohen, Nir; Carr, Steven A.; Abelin, Jennifer G.; Saeed, Mohsan; Sabeti, Pardis C. title: Profiling SARS-CoV-2 HLA-I peptidome reveals T cell epitopes from out-of-frame ORFs date: 2021-06-03 journal: Cell DOI: 10.1016/j.cell.2021.05.046 sha: 28d6850b575893b627c9b3c62e174c39886c91b6 doc_id: 721756 cord_uid: tqlzeh09 T cell-mediated immunity plays an important role in controlling SARS-CoV-2 infection; yet the repertoire of naturally processed and presented viral epitopes on HLA class I remains uncharacterized. Here, we report the first HLA-I immunopeptidome of SARS-CoV-2 in two cell lines at different times post-infection using mass spectrometry. We found HLA-I peptides derived not only from canonical ORFs, but also from internal out-of-frame ORFs in Spike and Nucleocapsid not captured by current vaccines. Some peptides from out-of-frame ORFs elicited T cell responses in a humanized mouse model and COVID-19 patients that exceeded responses to canonical peptides including some of the strongest epitopes reported to date. Whole proteome analysis of infected cells revealed that early expressed viral proteins contribute more to HLA-I presentation and immunogenicity. These biological insights as well as the discovery of out-of-frame ORF epitopes will facilitate selection of peptides for immune monitoring and vaccine development. As efforts continue to develop effective vaccines and therapeutics against Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), the virus causing the ongoing Coronavirus Disease 19 (COVID-19) pandemic (Lu et al., 2020) , it is critical to decipher how infected host cells interact with the immune system. Previous insights from SARS-CoV and MERS-CoV, as well as emerging evidence from SARS-CoV-2, imply that T cell responses play an essential role in SARS-CoV-2 immunity and viral clearance (Altmann and Boyton, 2020; Grifoni et al., 2020a; Le Bert et al., 2020; Moderbacher et al., 2020; Sekine et al., 2020) . Growing concerns about emerging viral variants and potential resistance to antibody defenses spurred renewed discussions about other immune responses and, in particular, cytotoxic T cells (Ledford, 2021) . When viruses infect cells, their proteins are processed and presented on the host cell surface by class I human leukocyte antigen (HLA-I). Circulating cytotoxic T cells recognize the presented foreign antigens and initiate an immune response, resulting in the clearance of infected cells. Investigating the repertoire of SARS-CoV-2 derived HLA-I peptides will enable identification of viral epitopes responsible for activation of cytotoxic T cells. to date utilized overlapping peptide tiling approaches and/or bioinformatic predictions of HLA-I J o u r n a l P r e -p r o o f 5 non-canonical ORFs in mammalian and viral genomes (Finkel et al., 2020a; Ingolia et al., 2009 Ingolia et al., , 2011 Stern-Ginossar et al., 2012) . While the function of most of these non-canonical ORFs remains unknown, it is becoming clear that the translated polypeptides serve as fruitful substrates for the antigen presentation machinery in viral infection, uninfected cells and cancer (Chen et al., 2020b; Hickman et al., 2018; Ingolia et al., 2014; Maness et al., 2010; Ouspenskaia et al., 2020; Ruiz Cuevas et al., 2021; Starck and Shastri, 2016; Yang et al., 2016) . Importantly, a recent study identified 23 unannotated ORFs in the genome of SARS-CoV-2, some of which have higher expression levels than the canonical viral ORFs (Finkel et al., 2020b) . Whether these non-canonical ORFs give rise to HLA-I bound peptides remains unknown. Here, we present the first examination of the HLA-I immunopeptidome in two SARS-CoV-2infected human cell lines, and complement this analysis with RNA-seq and global proteomics measurements. We identify viral HLA-I peptides that are derived from canonical and noncanonical ORFs and monitor the dynamics of viral protein expression and peptide presentation over multiple timepoints post infection. We show that peptides derived from out-of-frame ORFs elicit T cell responses in immunized mice and COVID-19 patients using ELISpot and multiplexed barcoded tetramer assays combined with single-cell sequencing. Whole proteome measurements suggest that the time of viral protein expression correlates with HLA-I presentation and immunogenicity and that SARS-CoV-2 interferes with the cellular proteasomal pathway, potentially resulting in lower presentation of viral peptides. Computational predictions and biochemical binding assays demonstrate that the detected HLA-I peptides can be presented by additional HLA-I alleles beyond the nine alleles tested in our study. Our findings can inform future immune monitoring assays in patients and aid in the design of efficacious vaccines. To interrogate the repertoire of human and viral HLA-I peptides, we immunoprecipitated (IP) HLA-I proteins from SARS-CoV-2-infected human lung A549 cells and human kidney HEK293T cells that were transduced to stably express ACE2 and TMPRSS2, two known viral entry factors. We then analyzed their HLA bound peptides by liquid chromatography tandem mass J o u r n a l P r e -p r o o f 6 spectrometry (LC-MS/MS) (Fig. 1A) . We also analyzed the whole proteome of the IP flowthrough by LC-MS/MS and performed RNA-seq to examine the effect of SARS-CoV-2 on human gene expression. To allow for the detection of peptides from the complete translatome of SARS-CoV-2, we combined the recently identified 23 ORFs (Finkel et al., 2020b) with the list of canonical ORFs and the human RefSeq database for LC-MS/MS data analysis. In choosing cell types for this study, we focused on achieving both biological relevance and high HLA-I allelic coverage. A549 are lung carcinoma cells representing the key biological target of SARS-CoV-2, and thus commonly used in COVID-19 studies. HEK293T cells endogenously express HLA-A*02:01 and B*07:02, two high frequency HLA-I alleles. Together, the nine HLA-I alleles expressed by HEK293T and A549 cells cover at least one allele in 63.8% of the human population ( Fig. 1B, Methods) . Using immunofluorescence staining of the nucleocapsid protein, we evaluate that ~70% of the transduced cells were infected at the peak infection time (Fig. S1 ). We validated the technical performance of our assays by examining the overall characteristics of presented HLA peptides. We identified 5,837 and 6,372 HLA-bound 8-11mer peptides in uninfected and infected (24 hpi) A549 cells, and 4,281 and 1,336 unique peptides in HEK293T cells, respectively (Table S1 ). The reduction in the total number of peptides after infection in HEK293T cells is likely due to cell death (~50% of cells 24hpi). As expected, peptide length distribution was not influenced by infection, and the majority of HLA-I peptides were 9-mers ( Fig. 1C) . Next, we compared the binding motifs of all 9-mer peptides between uninfected and infected cells per cell line and per individual HLA allele ( Fig. 1D; Fig. S2A ,B). We did not find major differences following infection and the observed amino acids at the main anchor positions 2 and 9 were in line with the expected binding motifs of the alleles expressed in the two cell lines. To evaluate if the MS-detected peptides are indeed predicted to bind to the expressed HLA-I alleles, we inferred the most likely allele to which each peptide binds using HLAthena (Sarkizova et al., 2020) . At a stringent cutoff of predicted percentile rank <=0.5, 87% of A549 and 73% of HEK293T identified peptides post infection were assigned to at least one of the alleles in the corresponding cell line ( Fig. 1E; Fig. S2C ). Differences in the relative representation of HLA alleles on the cell surface is influenced by both the expression level as well as the permissiveness of the binding motif of each allele (Fig. S2D ,E). Next, we examined HLA-I peptides that are derived from the SARS-CoV-2 genome ( Fig. 2A , Table S1 ). We identified 28 peptides from canonical proteins (nsp1, nsp2, nsp3, nsp5, nsp8, nsp10, nsp14, nsp15, S, M, ORF7a and N) . Strikingly, 9 peptides were derived from out-offrame ORFs in S and N. Four peptides matched to an in-silico six-frame translation database of the SARS-CoV-2 genome. However, manual inspection of ribosome profiling data (Finkel et al., 2020b) did not support translated ORFs in these regions. Most of the HLA-I peptides were detected in more than one experiment and predicted as good binders by HLAthena (%rank<2) to at least one of the expressed HLA alleles. We confirmed binding for 19 of the 20 HLA-I peptides predicted to be presented by four HLA alleles expressed in A549 and HEK293T cells (A*02:01, B*07:02, B*18:01 and B*44:03) using biochemical binding assays (IC50<500nM, Fig. 2B , Table S2 ). One peptide, HADQLTPTW, was also detected in non-infected A549 cells and thus, we removed it from all subsequent MS analyses. Surprisingly, we detected only one HLA-I peptide from N; a SARS-CoV-2 protein expected to be highly abundant based on previous RNA-seq and Ribo-seq studies (Finkel et al., 2020b; . To test if this low representation could be explained by lower expression of N in our experiment, we examined the whole proteome MS data. We found a strong correlation between the abundance of viral proteins in the proteome of the two cell lines (Pearson R=0.91, Fig. 2C) and with recently published translation measurements in infected Vero cells (Finkel et al., 2020b) (Pearson R=0.86 and R=0.78 for A549 and HEK293T, respectively; Fig. 2D ,E; Table S3A ). The N protein remained the most abundant viral protein in both cell lines. An alternative hypothesis for lower N representation could be that the protein harbours fewer peptides compatible with the HLA binding motifs. Therefore, for each SARS-CoV-2 ORF we computed the ratio between the number of peptides that are predicted to be presented by at least one of the HLA-I alleles in each cell line and the number of total 8-11mers. Notably, N had fewer than expected presentable peptides than most SARS-CoV-2 proteins in both cell lines Table S3B ). We then expanded our analysis to 92 HLA-I alleles with high population coverage and with immunopeptidome-trained predictors (Sarkizova et al., 2020) (Fig. 2H , Table S3B ). This analysis also categorized N among the least presentable canonical proteins of SARS-CoV-2. Together, our results hint that N might be less presented than expected given its J o u r n a l P r e -p r o o f 8 high expression level in infected cells (~10-fold greater than the next most abundant viral protein, Fig. 2C ). Our deep coverage of the viral proteins in whole proteome analysis (24 proteins) allowed us to observe several interesting findings. While the translation of ORF1a and 1ab, the source polyproteins of nsps 1-16, is 10-1000 fold lower than the structural ORFs (Finkel et al., 2020b) , we found that the abundance of some non-structural proteins was comparable to that of structural proteins (e.g nsp1 and nsp8, Fig. 2C ). Interestingly, although nsps 1-11 are posttranslationally cleaved from the same polyproteins, their expression levels were variable. This finding is consistent with two additional proteomics studies of SARS-CoV-2 infected cells utilizing different detergents in their lysis buffers (Schmidt et al., 2020; Stukalov et al., 2020) , suggesting that the observed differences in expression are not due to detergent solubility. Moreover, nsps 12-15, which originate from polyprotein 1ab downstream to the frameshift signal, are, as expected, expressed at lower levels. Another observation is that the S protein appeared as an outlier in both cell lines with higher expression in the proteome data compared to Ribo-seq measurements, suggesting it may undergo positive post-translational regulation ( Fig. 2D,E ; computed Pearson R when omitting S increased from 0.86 to 0.99 and 0.78 to 0.92 in A549 and HEK293T cells, respectively). To investigate the dynamics of HLA-I presentation during infection, we compared the relative abundance of HLA-I peptides in A549 and HEK293T cells at 3, 6, 12, 18 and 24 hpi. Due to technical reasons, we split the infection time course analysis into two batches (3, 6, 24hpi; and 12, 18, 24hpi) and normalized to the 24 hpi time point. Labeling with TMT enabled the detection of 10 viral HLA-I peptides in A549 cells; four of these peptides were quantified across all timepoints, two were only detected in the 12|18|24h plex, and four were only detected in the 3|6|24h plex ( Fig. 3A ; Table S1 ). It is likely that peptides that were detected only in the 3|6|24h plex were also presented on HLA-I at 12 and 18 hpi, however, due to separate cell culture experiments and data acquisition, they were not detected in the 12|18|24h plex. HLA-I presentation of most detected viral peptides peaked at 6 hpi, similarly to previous reports in vaccinia virus (Croft et al., 2013) and influenza virus . While some human-derived HLA-I peptides changed over time, the majority were fairly stable. In J o u r n a l P r e -p r o o f 9 HEK293T cells, we detected 13 peptides from SARS-CoV-2 , with the caveat of observing some peptides only in the 3|6|24h plex as described above (Fig. 3B , Table S1 ). Examining the dynamics of HLA-I peptides observed across all time points, we found that the abundance of some viral peptides peaked at 6hpi, however, we also observed maximal presentation at 12, 18 and 24hpi for others. To assess the relationship between HLA peptide presentation and the time of viral protein expression, we performed fractionated whole proteome MS analysis across 3, 6, and 24hpi timepoints from the same cell lysates. While the majority of viral proteins were expressed in cells at 6hpi, only eight and nine proteins were detected at 3hpi in A549 and HEK293T cells, respectively (Fig. 3C) . We found that viral proteins detected as early as 3hpi contributed to HLA-I presentation more than viral proteins expressed at 6hpi or later (Hypergeometric p<0.0375, Fig. 3D ) and elicited stronger CD8+ T cell responses in COVID-19 convalescent patients (Tarke et al., 2020) (Wilcoxon rank-sum p<0.0181, Fig. 3E ). This observation may explain a recent surprising finding that nsp3 is among the four most immunogenic proteins of SARS-CoV-2 (Tarke et al., 2020) . While nsp3 is not expressed at high levels, its early expression in infected cells may contribute to presentation of nsp3-derived HLA-I peptides. To investigate how the levels of viral source proteins impact their ability to be processed and presented, we ranked the individual SARS-CoV-2 proteins and HLA-I peptides according to their abundance in comparison to human proteins. Although the overall abundance of viral proteins in the infected cells proteome at 24 hpi was relatively low (HEK293T: 2.6%, A549: 3%; Fig. S3A ), individual viral proteins were highly expressed and exceeded most of the host proteins (Wilcoxon rank-sum test, A549: p<10 -4 , HEK293T: p<10 -6 ; Fig. 4A, Fig. S3B ; Table S4 ). In contrast to the high expression of their source proteins, the intensities of viral HLA-I peptides are similar to peptides from the host proteome, indicating that viral peptides are not preferentially presented (Wilcoxon rank-sum test, A549: p>0.8, HEK293T: p>0.4; Fig. 4B , Fig. S3C ; Table S1 ). Moreover, as recently shown for influenza virus , we found that the intensities of the viral HLA-I peptides do not directly correspond to their source protein abundances (Fig. 4A,B) . J o u r n a l P r e -p r o o f 10 To assess if there are global changes in HLA-I antigen presentation upon infection, we compared the overlap between HLA-I peptidomes of uninfected and infected (24hpi) A549 cells. The overlap among peptides detected in both experiments (62%, Fig. 4C ) was similar to what was observed in biological replicates of the same sample (Abelin et al., 2017; Demmers et al., 2019) . This high overlap and the relatively low HLA-I peptide representation from viral proteins that are expressed at 6hpi or later (Fig. 3D ) led us to interrogate the whole proteome data for evidence of viral interference with the antigen presentation pathway. Because we analyzed the whole proteome from the cell lysate post HLA immunopurification, the levels of HLA-A, -B and -C could not be evaluated. However, all other host proteins should remain intact and enable proteomic analyses of host responses to infection. First, we compared the expression of central HLA-I presentation pathway proteins (e.g. B2M, ERAP1/2, TAP1/2, and proteasome subunits) between uninfected and infected cells using our fractionated proteome data (~7,000 quantified proteins; Fig. 4D , Fig. S3D , Table S4 ). Although some antigen presentation proteins had cell type-specific expression patterns, we observed no significant differences in these proteins upon infection. Of note, HLA-F, which interacts with KIR3DS1 on NK cells during viral infection (Lunemann et al., 2018) , had increased expression in infected cells. Next, we compared all proteins detected in uninfected and infected cells to determine if proteins involved in ubiquitination, proteasomal function, antigen processing, and IFN signaling were altered (Fig. 4E , Table S4 ). We observed a general decrease in ubiquitination pathway proteins, with several of them significantly depleted in response to SARS-CoV-2 infection, including RNF181, UBE2B, and TRIM11. POMP, a chaperone critical for the assembly of 20S proteasomes and immunoproteasomes, was the most significantly depleted proteasomal protein in both infected cell lines (p<0.0095). POMP has recently been reported to impact ORF9c stability, which has been implicated in suppressing the antiviral response (Dominguez Andres et al., 2020) . As reported across multiple cell lines infected with SARS-CoV-2 (Chen et al., 2020a) , the tyrosine kinase, JAK1, critical for IFN signaling, was depleted in both A549 and HEK293T cells upon infection (Fig. 4E) . We confirmed the observed depletion of POMP and ubiquitination pathway proteins in an independent proteome study (Stukalov et al. 2020 ) that profiled uninfected and infected A549/ACE2 cells at 6hpi (Fig. S3E ) and 24hpi (Fig. 4F) . Taken together, these data suggest that SARS-CoV-2 may interfere with IFN signaling proteins and the HLA-I pathway through POMP depletion and by altering ubiquitination pathway proteins, that in J o u r n a l P r e -p r o o f 11 turn, may prevent abundant SARS proteins expressed later in infection from being effectively processed and presented. Remarkably, we detected nine HLA-I peptides processed from internal out-of-frame ORFs in the coding region of S and N, termed S.iORF1 (also known as ORF2b (Jungreis et al., 2021) ) and ORF9b. From S.iORF1/2, we detected three HLA-I peptides: GPMVLRGLIT, GLITLSYHL and MLLGSMLYM in HEK293T cells (Fig. 5A ). In addition, we detected six HLA-I peptides from ORF9b in A549 (LEDKAFQL and DEFVVVTV) and HEK293T cells (SLEDKAFQL, KAFQLTPIAV, ELPDEFVVV, and ELPDEFVVVTV) (Fig. 5B) . These HLA-I peptides cover overlapping protein sequences and contain binding motifs compatible with the expressed HLA-I alleles. To validate the amino acid sequences of these non-canonical peptides, we compared the tandem mass spectra of synthetic peptides to the experimental spectra and observed high correlation between fragment ions and retention times (+/-2 minutes, Fig. 5C ). Six of the peptides from out-of-frame ORFs were predicted to bind HLA-A*02:01 in HEK293T cells, suggesting the potential for widespread presentation of these non-canonical HLA-I peptides in the population. We confirmed binding for all six peptides using biochemical measurements in the presence of a high affinity radiolabeled A*02:01 ligand (IC50<500nM, Fig. 5D , Table S2 ). Interestingly, the three peptides with highest affinity among all tested HLA-I peptides originated from out-of-frame ORFs, two from S.iORF1/2 (MLLGSMLYM and GLITLSYHL, IC50<0.5nM) and one from ORF9b (ELPDEFVVVTV, IC50=1.6nM). In the context of T cell immunity and vaccine development, it is crucial to understand the effect of optimizing RNA sequences on the endogenously processed and presented HLA-I peptides derived from internal out-of-frame ORFs. Exogenous expression of viral proteins in vaccines often involve manipulating the native nucleotide sequences, e.g. via codon optimization, to enhance expression. These techniques maintain the amino acid sequence of the canonical ORF, yet may alter the sequence of proteins encoded in alternative reading frames. In addition to the two current mRNA vaccines targeting the S glycoprotein (Callaway, 2020; Jackson et al., 2020; Mulligan et al., 2020) , the nucleocapsid is also considered for vaccine development (Dutta et al., 2020; Zhu et al., 2004) . To investigate the effect of codon optimization on HLA-I peptides derived from S.iORF1/2 and ORF9b, we compared the native viral sequence to synthetic S and N from a SARS-CoV-2 human optimized ORFs library . As expected, there was no change in the main ORFs, however, the amino acid sequences in the +1 frame encoding S.iORF1/2 and ORF9b were significantly different (Fig. 5E,F) . In the case of S.iORF1, it is possible that this ORF is expressed in the human optimized construct, since the methionine driving its translation is preserved, however, the sequence of potential HLA-I peptides would be different (Fig. 5E ). In the case of ORF9b, the start codon was mutated, few stop codons were introduced along the ORF, and the sequence of the detected HLA-I peptides was altered (Fig. 5F) . These data suggest that human codon optimization of the main ORF may preclude the HLA-I presentation of peptides encoded from alternative ORFs. To evaluate the immunogenicity of the HLA-I peptides detected by MS, we conducted three assays probing T cell responses in a humanized mouse model, COVID-19 patients, and unexposed humans. First, we immunized five transgenic HLA-A2 mice with a pool of 9 A*02:01 peptides for 10 days and tested the T cell responses to individual peptides using INFγ ELISpot assay. We found positive response to three non-canonical peptides from out-of-frame ORFs, two from S.iORF1/2 (GLITLSYHL and MLLGSMLYM) and one from ORF9b (ELPDEFVVVTV), as well as a canonical peptide from nsp3 (YLNSTNVTI) (Fig. 6A,B) . Next, we investigated the immunogenicity of the HLA-I peptides in the context of COVID-19 disease. We performed ELISpot assays with PBMCs from six convalescent patients expressing HLA-A*02:01 and monitored IFNγ secretion in response to a pool of 15 HLA-I peptides from canonical ORFs and 7 peptides from the out-of-frame ORFs. As a positive control, we compared the T cell responses to a pool of 102 peptides tiling the nucleocapsid protein (N) measured in the same patients as part of another study (Gallagher et al., 2021) . We observed positive responses to the non-canonical pool in two of the six samples (Fig. 6C, D) . Notably, in one patient the T cell responses to the non-canonical pool exceeded the responses to the N pool, although the number of tested peptides was 14-fold lower (7 vs. 102 peptides in the noncanonical and the N pools, respectively). J o u r n a l P r e -p r o o f 13 To delineate the T cell responses against individual HLA-I peptides in humans, we utilized a multiplexed technology combining barcoded tetramer assay and single-cell sequencing of epitope-reactive CD8+ T cells (Fig. 6E , (Francis et al., 2021) ). Using this method, we obtained information on: (i) the ex-vivo frequency of CD8+ T cells reactive to each peptide in each sample; (ii) the sequences of the T cell receptors (TCRs, paired a/b chains) recognizing each peptide; and (iii) gene expression profiles of individual reactive CD8+ T cells. Testing nine HLA-A*02:01 samples (seven COVID-19 convalescent and two unexposed), we found reactivity to positive control peptides from influenza and SARS-CoV-2 (Fig. 6F , Table S5A ). As expected, HLA-I peptides that bind A*02:01 according to our affinity measurements (Table S2 ) elicited stronger CD8+ responses than peptides that were detected on other HLA alleles (Wilcoxon rank-sum p<10 -6 , Fig. S4A ). Two non-canonical peptides from ORF9b, ELPDEFVVVTV and SLEDKAFQL, were in the top five reactive peptides (Table S5A) . Strikingly, ELPDEFVVVTV invoked the strongest CD8+ response among all tested HLA-I peptides, with the frequency of detected T cells similar to that observed for the influenza epitope and above those for three commonly recognized SARS-CoV-2 epitopes: YLQPRTFLL, KLWAQCVQL, and LLYDANYFL (Ferretti et al., 2020) . Of note, YLQPRTFLL, has been considered the most reactive SARS-CoV-2 epitope in a few independent studies (Ferretti et al., 2020; Habel et al., 2020; Shomuradova et al., 2020) . Examining the gene expression profile and the TCR sequence of the reacting T cells provided additional supporting evidence for the functional relevance of the ELPDEFVVVTV epitope during the course of COVID-19. Most cells reactive to ELPDEFVVVTV showed high expression of effector markers and moderate to high expression of memory markers based on gene sets described in a recent COVID-19 CD8+ subpopulation profiling study ( Fig. 6G ) (Su et al., 2020) . In addition, the TCR sequences of CD8+ T cells reactive to ELPDEFVVVTV revealed significant CDR3 homology across patients ( Fig. S4B-D) . While our T cell data provide evidence for CD8+ responses to peptides from ORF9b in COVID-19 patients, we did not detect significant responses to HLA-I peptides from S.iORF1/2, GLITLSYHL and MLLGSMLYM, in the seven tested COVID-19 samples. To evaluate the immunogenicity of the third HLA-I peptide from S.iORF1/2, GPMVLRGLIT, we performed an additional barcoded tetramer assay with PBMCs from COVID-19 patients expressing HLA-B*07:02. We observed expected positive reactivity to control peptides from EBV (RPPIFIRRL) and SARS-CoV-2 (SPRWYFYYL) as well as overall greater CD8+ responses to HLA-I peptides J o u r n a l P r e -p r o o f 14 that bind B*07:02 (Wilcoxon rank-sum p<10 -10 , Fig. S4E ,F, Table S5B ). However, we found no significant responses to GPMVLRGLIT in patients, although we detected this peptide multiple times in our MS experiments (Table S1 ). It is possible that our assay was not sensitive enough to capture T cell responses to the three non-canonical peptides from S.iORF1/2, since we also observed weak responses to KLWAQCVQL, a commonly recognized A*02:01 epitope in COVID-19 patients (Ferretti et al., 2020; Takagi and Matsui, 2020) , exhibiting similar reactivity as GLITLSYHL from S.iORF1/2. Increasingly accurate HLA-I presentation prediction tools are routinely applied to the full transcriptome or proteome of an organism to computationally nominate presentable epitopes. However, these tools are trained on data that is agnostic to virus-specific processes that may interfere with the presentation pathway. Thus, the sensitivity and specificity of in silico predictions for any particular virus are insufficiently characterized. To assess how well computational tools would recover the MS-identified HLA-I peptides, we used HLAthena (Abelin et al., 2017; Sarkizova et al., 2020) to retrospectively predict all 8-11mer peptides tiling SARS-CoV-2 proteins against the complement of HLA-I alleles expressed by A549 and HEK293T ( Fig. 7A , Table S6A ). Of the 36 MS-identified peptides, 23 had a predicted percentile rank (%rank) <0.5 and 31 had %rank <2. Within 39,875 possible SARS-CoV-2 8-11mers, 14 of 18 A549 HLA-I peptides and 11 of 18 HEK293T peptides had %rank scores within the top 1000 viral peptides (top 1.5% and 1.7% for A549 and HEK293T, respectively). To account for variability in viral protein expression levels, we repeated this analysis within the source protein of each peptide. We found that 16 of 36 peptides scored within the top 10 amongst all 8-11mers of the source protein, and 21 scored within the top 20. These observations suggest that while an in-silico epitope prediction scheme that nominates the top 10-20 peptides of each viral protein would recover ~50% (16-21 of 36) of observed epitopes with very high priority, this list would still only encompass ~5-10% true LC-MS/MS positives (16-21 of 10*#proteins). Next, we estimated the HLA allele coverage achieved by the observed endogenously processed and presented viral epitopes amongst AFA, API, EUR, HIS, USA, and World populations at different %rank cutoffs based on HLAthena predictions across 92 HLA-I alleles (Fig. 7B, Fig. S5 , Table S6B ,C). At the second most stringent cutoff, %rank<=0.5, 31 of the 36 individual J o u r n a l P r e -p r o o f 15 peptides were predicted to bind at least one allele (range: 1-21, median: 4.9, mean: 4.5). Combined together, the MS-identified peptide pool was estimated to cover at least one HLA-A, -B, or -C allele for 99% of the population with at least one peptide. To validate the predicted binding of the HLA-I peptides, we performed biochemical binding measurement with 30 synthetic peptides and 5 HLA alleles not present in the two profiled cell lines. We confirmed binding for 5 of 9 (56%) HLA-I peptides predicted at a 0.5 %rank threshold and 12 of 29 (41%) peptides predicted at a %rank threshold of 2 ( Fig. 7C) , with significantly higher measured affinities for predicted binders vs non-binders (Fig. 7D , Table S2 ). Moreover, two peptides with predicted presentation on HLA alleles not profiled in our cell lines were recently found to elicit T cell responses in convalescent COVID-19 patients expressing the predicted alleles (EILDITPCSF and QLTPTWRVY detected on A*25:01 and C*16:01, were predicted to bind A*26:01 and A*30:02 at %rank<=0.5, respectively, (Table S7 ) (Tarke et al. 2020) ). These results indicate that HLA-I immunopeptidomics on only two cell lines combined with epitope prediction tools can help prioritize CD8+ T cell epitopes with high population coverage. We provide the first view of SARS-CoV-2 HLA-I peptides that are endogenously processed and presented by infected cells. Although our study profiled two cell lines, it uncovers insights into SARS-CoV-2 antigen presentation that extend beyond the nine HLA alleles tested here: (i) A substantial fraction, 9 of 36 (25%), of viral peptides detected are derived from internal out-offrame ORFs in S (S.iORF1/2) and N (ORF9b). Remarkably, HLA-I peptides from non-canonical ORFs were strongly immunogenic in immunized mice and convalescent COVID-19 patients, as shown by both pooled ELISpot and multiplexed tetramer assays. These observations imply that current interrogations of T cell responses in COVID-19 patients, which focus on the canonical viral ORFs (Grifoni et al., 2020a; Weiskopf et al., 2020) , exclude an important source of virusderived HLA-I epitopes. (ii) A large fraction of detected HLA-I peptides were from non-structural proteins (nsps). While earlier studies focused mostly on T cell responses to structural proteins, this finding, together with recent studies that expanded their epitope pools to include nonstructural proteins Kared et al., 2021; Tarke et al., 2020) , portray nsps as integral part of the T cell response to SARS-CoV-2. (iii) The timing of SARS-CoV-2 protein expression appears to be a key determinant for antigen presentation and immunogenicity. J o u r n a l P r e -p r o o f 16 Proteins expressed earlier in infection (3hpi) were more likely to be presented on the HLA-I complex and elicit a T cell response in COVID-19 patients. Recent findings highlight the need to look beyond antibodies for strategies to achieve longlasting protection against COVID-19 (Ledford, 2021) . Several newly emerged SARS-CoV-2 variants are poorly neutralized by antibodies raised against the parental isolates used in the current vaccines (Chen et al., 2021; Wu et al., 2021) . Importantly, recent studies have shown that CD8+ T cell responses are not substantially affected by mutations found in prominent SARS-CoV-2 variants Tarke et al., 2021) . Thus, integrating T cell epitopes into the design of next-generation vaccines has the potential to provide prolonged protection in the face of emerging variants. Our work reveals that ORF9b is an important source of T cell epitopes that remains largely unexplored in the context of T-cell immunity. Although relatively short (97 aa), ORF9b yielded six HLA-I peptides (16% of total detected peptides) in both A549 and HEK293T cells that bind at least four different alleles (A*02:01, B*18:01, B*44:03 and A*26:01). We identified two A*02:01 peptides, ELPDEFVVVTV and SLEDKAFQL, which elicit CD8+ T cell responses in convalescent patients, demonstrating that ORF9b is translated and presented on HLA-I in-vivo during the course of COVID-19. Moreover, ORF9b is highly expressed and among the few viral proteins that are detected early in infection, two traits that correlate with HLA-I presentation and immunogenicity. Specifically, our study highlights ELPDEFVVVTV as a promising T cell epitope. It binds both A*02:01 and A*26:01, elicits strong T cell responses in immunized mice and COVID-19 patients, and is recognized by TCRs from different patients sharing a mutual CDR3 motif. Importantly, ELPDEFVVVTV elicits stronger T cell responses (in five of seven patients studied here) than the three most commonly recognized A*02:01 SARS-CoV-2 epitopes (Ferretti et al., 2020) , including YLQPRTFLL, which was recorded as the most potent SARS-CoV-2 epitope in three independent studies (Ferretti et al., 2020; Habel et al., 2020; Shomuradova et al., 2020) and is the target of commercial monomer and tetramer assays. In contrast to ORF9b, S.iORF1/2-derived peptides did not elicit significant T cell responses in convalescent COVID-19 patients. This finding is surprising given that GLITLSYHL and MLLGSMLYM had the highest affinity to HLA-A*02:01 among all HLA-I peptides tested and were immunogenic in a humanized mouse model, demonstrating that they can elicit T cell responses in-vivo. Moreover, GLITLSYHL immunogenicity in mice was 10-fold higher than ELPDEFVVVTV, the most potent SARS-CoV-2 epitope detected in COVID-19 patients with J o u r n a l P r e -p r o o f 17 comparable responses only to an Influenza epitope. The discrepancy between the immunogenicity of S.iORF1/2 derived peptides in mice and COVID-19 patients could suggest an immune evasion mechanism to attenuate the translation and/or antigen processing of these non-canonical ORFs in patients. Testing T cell responses in convalescent samples, as done in our study, is biased toward symptomatic patients and perhaps T cell reactivity to these peptides is associated with asymptomatic infection. Interestingly, while the sequence encoding the canonical and ORF9b-derived HLA-I peptides remained unchanged in the recent emerging SARS-CoV-2 variants B.1.1.7, P.1, and B.1.351 (originally detected in the UK, Brazil and South Africa, respectively) (Rambaut et al., 2020) , the three HLA-I peptides derived from S.iORF1/2 were mutated (Fig. S6 , Table S8 Our analyses demonstrate that synthetic approaches aiming at enhancing the expression of canonical ORFs, some of which are utilized in current vaccine strategies, can inadvertently eliminate or alter the production of HLA-I peptides derived from overlapping reading frames. Researchers may need to carefully examine the effect of sequence manipulation and codon optimization on internal overlapping ORFs, especially those encoding HLA-I peptides. In broader terms, many viral genomes have evolved to increase their coding capacity by utilizing overlapping ORFs and programmed frameshifting (Ketteler, 2012) . Thus, our findings suggest a more general principle in vaccine design according to which optimizing expression of desired antigens using codon optimization can be at the expense of CD8+ response if the same region encodes a source protein for T cell epitopes in an alternative frame. Combining insights from ribosome profiling and HLA-I immunopeptidomics can uncover the presence of non-canonical peptides that will enable more informed decisions in vaccine design. Proteomics analyses of infected cells show that SARS-CoV-2 may interfere with the presentation of HLA-I peptides and the expression of ubiquitination and immune signaling pathway proteins. We found that SARS-CoV-2 infection leads to a significant decrease in the J o u r n a l P r e -p r o o f 18 expression of POMP and ubiquitination pathway proteins. By impacting ubiquitin-mediated proteasomal degradation and immune signaling proteins, SARS-CoV-2 may reduce the precursors for downstream processing and HLA-I presentation and alter the immune response. The effects of SARS-CoV-2 on HLA-I presentation may be influenced by additional factors such as translation inhibition by nsp1 (Schubert et al., 2020) and degradation of host transcripts (Finkel et al., 2020c ) that can diminish antigen presentation by attenuating the expression of HLA-I molecules. Moreover, a recent study reports that ORF8 protein disrupts HLA-I antigen presentation and reduces the recognition and the elimination of virus-infected cells by CTLs (Park, 2020; Zhang et al., 2020) . Further research is needed to directly probe the various effects of SARS-CoV-2 on HLA-I antigen presentation. In summary, our work uncovers previously uncharacterized SARS-CoV-2 HLA-I peptides from out-of-frame ORFs in the SARS-CoV-2 genome and highlights the contribution of these viral epitopes to the immune response in a mouse model and convalescent COVID-19 patients. These new CD8+ T cell targets and the insights into HLA-I presentation in infected cells will enable a more precise selection of peptides for COVID-19 immune monitoring and vaccine development. The results of this study should be interpreted within the context of its technical limitations. First, immunopeptidome profiling was performed in infected cell lines and may not capture the in-vivo conditions in a faithful manner. Nevertheless, T cell responses in patients to HLA-I peptides, including non-canonical epitopes, support the in-vivo presentation of at least some of the peptides reported in this study. Second, our study spans nine HLA alleles endogenously expressed in two cell lines. Further studies of SARS-CoV-2 infected cell lines from diverse lineages and primary tissues expressing different HLA alleles will likely facilitate identification of additional epitopes. Third, LC-MS/MS based assays can suffer from false negatives if peptide abundance is below the limit of detection or the sequence does not ionize well. One or more of the authors of this paper self-identifies as an underrepresented ethnic minority in science. One or more of the authors of this paper self-identifies as a member of the LGBTQ+ community. One or more of the authors of this paper self-identifies as living with a disability. (Table S3 in (Tarke et al., 2020) ). proteome data from A549/ACE2 cells 24hpi (Stukalov et al., 2020) . See also Fig. S3, Table S4 . Shown are the fractions of peptides that were confirmed to bind the predicted alleles (IC50<500nM , Table S2 ). (D) IC50 nM affinity measurements of HLA-I peptides for nine alleles separated by predicted binders (%rank<2) and predicted non-binders (%rank>=2) (Welch Two Sample t-test, data presented as median, whiskers reach to lowest and highest values no further than 1.5x IQR). See also Fig. S5 , Table S6 . Further information and requests for resources and reagents should be directed to the lead contact, Shira Weingarten-Gabbay (shirawg@broadinstitute.org). Cell lines transduced with ACE2 and TMPRSS2 are available upon request. The raw RNA sequencing data generated in this study have been submitted to the Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/) under accession number GSE159191. The original mass spectra, peptide spectrum matches, and the protein sequence databases used for searches have been deposited in the public proteomics repository MassIVE (https://massive.ucsd.edu) and are accessible at ftp://MSV000087225@massive.ucsd.edu. Supplemental tables were uploaded to Mendeley http://dx.doi.org/10.17632/9dmts2652x.1. Peripheral blood samples for pooled ELISpot assays were collected from COVID-19 convalescent patients (2 male Human embryonic kidney HEK293T cells (female), human lung A549 cells (male), and African green monkey kidney Vero E6 cells (female) were maintained at 37ºC and 5% CO2 in DMEM containing 10% FBS. We generated stable HEK293T and A549 cells expressing human ACE2 and TMPRSS2 by transducing them with lentivirus particles carrying these two cDNAs. A549 The 2019-nCoV/USA-WA1/2020 isolate (NCBI accession number: MN985325) of SARS-CoV-2 was obtained from the Centers for Disease Control and Prevention and BEI Resources. To generate the virus P1 stock, we infected Vero E6 cells with this isolate for 1h at 37ºC, removed the virus inoculum, rinsed the cell monolayer with 1X PBS, and added DMEM supplemented with 2% FBS. Three days later, when the cytopathic effect of the virus became visible, we J o u r n a l P r e -p r o o f 28 harvested the culture medium, passed through a 0.2µ filter, and stored it at -80ºC. To generate the virus P2 stock, we infected Vero E6 cells with the P1 stock at a multiplicity of infection (MOI) of 0.1 plaque forming units (PFU)/cell and harvested the culture medium three days later using the same protocol as for the P1 stock. All experiments in this study were performed using the P2 stock. A549 and 293T cells expressing ACE2 and TMPRSS2 were infected with SARS-CoV-2 (Washington isolate) at an MOI of 3 for indicated times (3, 6, 12, 18, and 24 hpi). After infection, supernatants were removed, and cells were fixed with 4% paraformaldehyde for 30 minutes at room temperature. Cells were then permeabilized with 0.1% of triton-X100 in PBS for 10 minutes and hybridized with Anti-SARS-CoV Nucleocapsid (N) Protein (RABBIT) polyclonal antibody (1:2000, Rockland, #200-401-A50) at 4ºC overnight. Alexa Fluor 568 goat anti rabbit antibody (Invitrogen, A11011) were used as the secondary antibody for labelling virus infected cells. Finally, DAPI was added to label the nuclei. Immunofluorescent images were taken using an EVOS microscope with 10x lens and infection rates were calculated with ImageJ. Vero E6 cells were used to determine the titer of our virus stock and to evaluate SARS-CoV-2 inactivation following lysis of infected cells in our HLA-IP buffer. Briefly, we seeded Vero E6 cells into a 12-well plate at a density of 2.5 x 10 5 cells per well, and the next day, infected them J o u r n a l P r e -p r o o f 29 with serial 10-fold dilutions of the virus stock (for titration) or the A549 lysates (for the inactivation assay) for 1h at 37ºC. We then added 1 ml per well of the overlay medium containing 2X DMEM (Gibco: #12800017) supplemented with 4% FBS and mixed at a 1:1 ratio with 1.2% Avicel (DuPont; RC-581) to obtain the final concentrations of 2% and 0.6% for FBS and Avicel, respectively. Three days later, we removed the overlay medium, rinsed the cell monolayer with 1X PBS and fixed the cells with 4% paraformaldehyde for 30 minutes at room temperature. 0.1% crystal violet was used to visualize the plaques. Cells engineered to express SARS-CoV-2 entry factors were seeded into nine 15 cm dishes (three dishes per time point) at a density of 15 million cells per dish for A549 cells and 20 million cells per dish for HEK293T cells. The next day, the cells were infected with SARS-CoV-2 at a multiplicity of infection (MOI) of 3. To synchronize infection, the virus was bound to target cells in a small volume of opti-MEM on ice for one hour, followed by addition of DMEM/2% FBS and switching to 37ºC. At 3, 6, 12, 18, and 24h post-infection, the cells from three dishes were scraped into 2.5ml/dish of cold lysis buffer (20mM Tris, pH 8.0, 100mM NaCl, 6mM MgCl2, 1mM EDTA, 60mM Octyl β-d-glucopyranoside, 0.2mM Iodoacetamide, 1.5% Triton X-100, 50xC0mplete Protease Inhibitor Tablet-EDTA free and PMSF) obtaining a total of 9 ml lysate. This lysate was split into 6 eppendorf tubes, with each tube receiving 1.5 ml volume, and incubated on ice for 15 min with 1ul of Benzonase (Thomas Scientific, E1014-25KU) to degrade nucleic acid. The lysates were then centrifuged at 4,000 rpm for 22min at 4ºC and the supernatants were transferred to another set of six eppendorf tubes containing a mixture of prewashed beads (Millipore Sigma, GE17-0886-01) and 50 ul of an MHC class I antibody (W6/32) (Santa Cruz Biotechnology, sc-32235). The immune complexes were captured on the beads by incubating on a rotor at 4ºC for 3hr in the BSL3 lab. Virus inactivation was confirmed before subsequent samples processing outside the BSL3 using plaque assay (Fig. S1C) . The unbound lysates were kept for whole proteomics analysis while the beads were washed to remove nonspecifically bound material. In total, nine washing steps were performed; one wash with 1mL of cold lysis wash buffer (20mM Tris, pH 8.0, 100mM NaCl, 6mM MgCl2, 1mM EDTA, 60mM Octyl β-d-glucopyranoside, 0.2mM Iodoacetamide, 1.5% Triton X-100), four washes with 1mL of cold complete wash buffer (20mM Tris, pH 8.0, 100mM NaCl, 1mM EDTA, 60mM Octyl β-dglucopyranoside, 0.2mM Iodoacetamide), and four washes with 20mM Tris pH 8.0 buffer. Dry beads were stored at -80ºC until mass-spectrometry analysis was performed. HLA peptides were eluted and desalted from beads as described previously (Sarkizova et al., 2020) . After the primary elution step, HLA peptides were reconstituted in 3% ACN/5% FA and subjected to microscaled basic reverse phase separation. Briefly, peptides were loaded on Stage-tips with 2 punches of SDB-XC material (Empore 3M) and eluted in three fractions with increasing concentrations of ACN (5%, 10% and 30% in 0.1% NH 4 OH, pH 10). For the time course experiment, one third of a pool of 6 IPs (for 12|18|24h) or a pool of 2 IPs (for 3|6|24hpi) was also labeled with TMT6 (Thermo Fisher Scientific, lot # UC280588, A549: 12h:126, 3h:127, 18h:128, 129: 6h, 24h:130, HEK293T: 3h: 126, 12h:127, 6h:128, 18h:129, 24h:131) (Thompson et al., 2003) , combined and desalted on a C18 Stage-tip, and then eluted into three fractions using basic reversed phase fractionation with increasing concentrations of ACN (10%, 15% and 50%) in 5 mM ammonium formate (pH 10). Peptides were reconstituted in 3% ACN/5%FA prior to loading onto an analytical column (25-30cm, 1.9µm C18 (Dr. Maisch HPLC GmbH), packed in-house PicoFrit 75 µm inner diameter, 10 µm emitter (New Objective)). Peptides were eluted with a linear gradient (EasyNanoLC 1200, Thermo Fisher Scientific) ranging from 6-30% Solvent B (0.1%FA in 90% ACN) over 84 min, 30-90% B over 9 min and held at 90% B for 5 min at 200 nl/min. MS/MS were acquired on a Thermo Orbitrap Exploris 480 equipped with FAIMS (Thermo Fisher Scientific) in data dependent acquisition. FAIMS CVs were set to -50 and -70 with a cycle time of 1.5s per FAIMS experiment. MS2 fill time was set to 100ms, collision energy was 29CE or 32CE for TMT respectively. 200 uL aliquot of HLA IP supernatants were reduced for 30 minutes with 5mM DTT (Pierce DTT: A39255) and alkylated with 10mM IAA (Sigma IAA: A3221-10VL) for 45 minutes both at 25°C on a shaker (1000 rpm). Protein precipitation using methanol/chloroform was then performed. Briefly, methanol was added at a volume of 4x that of HLA IP supernant aliquot. This was followed by a 1x volume of chloroform and 3x volume of water. The sample was mixed by vortexing and incubated at -20° C for 1.5 hours. Samples were then centrifuged at 14,000 rpm for 10 minutes and the upper liquid layer was removed leaving a protein pellet. The pellet was rinsed with 3x volume of methanol, vortexed lightly, and centrifuged at 14,000 rpm for 10 minutes. Supernatant was removed and discarded without disturbing the pellet. Pellets were resuspended in 100 mM triethylammonium bicarbonate (pH 8.5) (TEAB). Samples were digested with LysC (1:50) for 2h on a shaker (1000 rpm) at 25°C, followed by trypsin (1:50) overnight. Samples were acidified by 1% formic acid final concentration and dried. Samples J o u r n a l P r e -p r o o f 31 were reconstituted in 4.5 mM ammonium formate (pH 10) in 2% (vol/vol) acetonitrile and separated into four fractions using basic reversed phase fractionation on a C-18 Stage-tip. Fractions were eluted at 5%, 12.5%, 15%, and 50% ACN/4.5 mM ammonium formate (pH 10) and dried. Fractions were reconstituted in 3%ACN/5%FA, and 1 ug was used for LC-MS/MS analysis. MS/MS were acquired on a Thermo Orbitrap Exploris 480 (Thermo Fisher Scientific) in data dependent acquisition (MS2 isolation width 0.7m/z, top20 scans, collision energy 30%) ( Fig. 2, 3, 4, S3B-D) . Uninfected 1 ug single shot samples were analyzed similarly. For the time course experiment, the samples (12h, 18h, 24h) were not fractionated and 1 ug was used for LC-MS/MS analysis, as described above except that FAIMS with -50, -65, and -85 CV was applied and cycle time was 0.8s for each CV (Fig. S3A) . Peptide sequences were interpreted from MS/MS spectra using Spectrum Mill (v 7.1 prerelease) to search against a RefSeq-based sequence database containing 41,457 proteins mapped to the human reference genome (hg38) obtained via the UCSC Table Browser (https://genome.ucsc.edu/cgi-bin/hgTables) on June 29, 2018, with the addition of 13 proteins encoded in the human mitochondrial genome, 264 common laboratory contaminant proteins, 553 human non-canonical small open reading frames, 28 SARS-CoV2 proteins obtained from RefSeq derived from the original Wuhan-Hu-1 China isolate NC_045512.2 (https://www.ncbi.nlm.nih.gov/nuccore/1798174254) (Wu et al., 2020) , and 23 novel unannotated virus ORFs whose translation is supported by Ribo-seq (Finkel et al., 2020b) for a total of 42,337 proteins. Amongst the 28 annotated SARS-CoV2 proteins we opted to omit the fulllength polyproteins ORF1a and ORF1ab, to simplify peptide-to-protein assignment, and instead represented ORF1ab as the mature 16 individual non-structural proteins that result from proteolytic processing of the 1a and 1ab polyproteins. We added the D614G variant of the SARS-Cov2 Spike protein that is commonly observed in European and American virus isolates. For additional searches, we also added 2036 entries from 6-frame translation of the SARS-Cov2 genome for all possible ORFs longer than 6 amino acids (Table S1 ). For immunopeptidome data MS/MS spectra were excluded from searching if they did not have a precursor MH+ in the range of 600-4000, had a precursor charge >5, or had a minimum of <5 detected peaks. Merging of similar spectra with the same precursor m/z acquired in the same chromatographic peak was disabled. Prior to searches, all MS/MS spectra had to pass the spectral quality filter with a sequence tag length >1 (i.e., minimum of 3 masses separated by the Da; precursor mass tolerance of ±10 ppm; product mass tolerance of ± 10 ppm, and a minimum matched peak intensity of 30%. Peptide spectrum matches (PSMs) for individual spectra were automatically designated as confidently assigned using the Spectrum Mill auto-validation module to apply target-decoy based FDR estimation at the PSM level of <1.5% FDR. For the TMT-labeled time course experiments, two parameters were revised: the MH+ range filter was 800-6000, and TMT labeling was required at lysine, but peptide N-termini were allowed to be either labeled or unlabeled. Relative abundances of peptides in the time-course experiments were determined in Spectrum Mill using TMT reporter ion intensity ratios from each PSM. TMT reporter ion intensities for the 3 time points split across two plexes were not corrected for isotopic impurities because the respective adjacent intervening labels were not included. Each peptide-level TMT ratio was calculated as the median of all PSMs contributing to that peptide. PSMs were excluded from the calculation that lacked a TMT label, or had a negative delta forward-reverse identification score (half of all false-positive identifications). Intensity values for each time point were normalized to the 24h time point to compare between the 12|18|24h and 3|6|24h plex. For whole proteome data MS/MS spectra were excluded from searching if they did not have a precursor MH+ in the range of 600-6000, had a precursor charge >5, had a minimum of <5 detected peaks, or failed the spectral quality filter with a sequence tag length >0 (i.e., minimum of 2 masses separated by the in-chain masses of 1 amino acid) based on ESI-QEXACTIVE-HCD-v4-30-20 peak detection. Similar spectra with the same precursor m/z acquired in the same chromatographic peak were merged. MS/MS search parameters included: ESI-QEXACTIVE-HCD-v4-30-20 scoring parameters; Trypsin allow P specificity with a maximum of 4 missed cleavages; fixed modification: carbamidomethylation of cysteine and seleno-cysteine; variable modifications: oxidation of methionine, deamidation of asparagine, acetylation of protein N-termini, pyroglutamic acid at peptide N-terminal glutamine, and pyrocarbamidomethylation at peptide N-terminal cysteine; precursor mass shift range of -18 to 64 Da; precursor mass tolerance of ±20 ppm; product mass tolerance of ± 20 ppm, and a minimum matched peak intensity of 30%. Peptide spectrum matches (PSMs) for individual spectra were J o u r n a l P r e -p r o o f 33 automatically designated as confidently assigned using the Spectrum Mill auto-validation module to apply target-decoy based FDR estimation at the PSM level of <1.0% FDR. Protein level data was summarized by top uses shared (SGT) peptide grouping and non-human contaminants were removed. SARS-CoV-2 derived proteins were manually filtered to include identifications with >6% sequence coverage and at least 2 or more unique peptides. Peptide identifications were validated using synthetic peptides. Synthetic peptides were obtained from Genscript, at purity >90% purity and dissolved to 10 mM in DMSO. For LC-MS/MS measurements, peptides were pooled and further diluted with 0.1% FA/3% ACN to load 120 fmol/µl on column. One aliquot of synthetic peptides was also TMT labeled as described above. LC-MS/MS measurements were performed as described above. For plots, peak intensities in the experimental and the synthetic spectrum were normalized to the highest peak. A549 and HEK293T cells were seeded into 6-well plates at a density of 5 x 10 5 cells per well (one well per condition). After 11-24h, the cells were infected with SARS-CoV-2 at an MOI of 3. At 12, 18 and 24h post-infection, the cells were lysed in Trizol (Thermo, 15596026), and the total RNA was isolated using standard phenol chloroform extraction. Standard Illumina TruSeq Stranded mRNA (LT) was performed using 500 ng of total RNA (illumina, FC-122-2101). Oligo-dT beads were used to capture polyA-tailed RNA, followed by fragmentation and priming of the captured RNA (8 minutes at 94ºC). Immediately first strand cDNA synthesis was performed. Second strand cDNA synthesis was performed using second strand marking master and DNA polymerase 1 and RNase H. cDNA was adenylated at the 3' ends followed immediately by RNA end ligation single-index adapters (AR001-AR012). Library amplification was performed for 12-15 cycles under standard illumina library PCR conditions. Library quantitation was performed using Agilent 2200 TapeStation D1000 ScreenTape (Agilent, 5067-5582). RNA sequencing was performed on the NextSeq 550 System using a NextSeq V2.5 High Output 75 cycle kit (illumina, 20024906) or 150 cycles kit (illumina, 20024907) for paired-end sequencing (70nt of each end). Classical competition assays, based on the inhibition of binding of a high affinity radiolabeled ligand to purified MHC molecules, were utilized to quantitatively measure peptide binding to HLA-A and -B class I MHC molecules. The assays were performed, and MHC purified, as J o u r n a l P r e -p r o o f 34 detailed previously (Sidney et al., 2013) . Briefly, 0.1-1 nM of radiolabeled peptide was coincubated at room temperature with 1 µM to 1 nM of purified MHC in the presence of a cocktail of protease inhibitors and 1 µM B2-microglobulin. MHC bound radioactivity was determined (Cheng and Prusoff, 1973; Gulukota et al., 1997) . Each competitor peptide was tested at six different concentrations covering a 100,000-fold dose range, and in three or more independent experiments. As a positive control, the unlabeled version of the radiolabeled probe was also tested in each experiment. Five mice were immunized subcutaneously in the flank with a vaccine. The vaccine contained nine A*02:01 peptides (50ug each peptide per mice) emulsified in Complete Freunds Adjuvant (CFA BD Bioscience/Difco) supplemented with 20ug PolyIC/LC (Hiltonol/Oncovir). 10 days postvaccination, animals were euthanized using CO2, and Spleens were removed for Elispot assays. Elispot was performed using red blood cell-depleted mouse splenocytes (200,000 cells/well) coincubated with the individual peptides (10µg/ml) in triplicate in ELISpot plates (Millipore, Billerica, MA) for 18h. Interferon-γ (IFNγ) secretion was detected using capture and detection antibodies as described (Mabtech AB, Nacka Strand, Sweden) and imaged using an ImmunoSpot Series Analyzer (Cellular Technology, Ltd, Cleveland, OH). HLA-A*02:01 restricted HIV-GAG peptide and non-stimulated wells were used as negative controls. Spot numbers were normalized by removing the average background spot numbers calculated from negative control wells. AntiCD3 (2C11 BD BioScience) and PHA was used as a positive control. 55 spot-forming units/10 6 cells and a ≥3-fold increase over baseline is used as a threshold for positive responses. Methods were described in detail previously (Keskin et al., 2015) . Cells were incubated for 16-20 hours at 37oC before developing according to manufacturer's instructions. Spots were counted using an ImmunoSpot CoreS6 ELISpot counter (ImmunoSpot). The negative control background was subtracted from the antigen wells and the results are shown as spot forming units (SFU) per 2.5e5 PBMC. A spot cut off of 8 after background subtraction is used here to denote a positive response. HLA-A*02:01, and HLA-B*07:02 extracellular domains were expressed in E. coli and refolded along with beta-2-microglobulin and UV-labile place-holder peptides KILGFVFJV, and AARGJTLAM, respectively (Altman and Davis, 2016) . The MHC monomer was then purified by size exclusion chromatography (SEC). MHC tetramers were produced by mixing alkylated MHC monomers and azidylated streptavidin in 0.5 mM copper sulfate, 2.5 mM BTTAA and 5 mM ascorbic acid for up to 4 h on ice, followed by purification of highly multimeric fractions by SEC. Individual peptide exchange reactions containing 500 nM MHC tetramer and 60 uM peptide were exposed to long-wave UV (366 nm) at a distance of 2-5 cm for 30 min at 4˚C, followed by 30 min incubation at 30˚C. A biotinylated oligonucleotide barcode (Integrated DNA Technologies) was added to each individual reaction followed by 30 minute incubation at 4˚C. Individual tetramer reactions were then pooled and concentrated using 30 kDa molecular weight cut-off centrifugal filter units (Amicon). Biolegend) for 15 minutes followed by washing. Tetramer bound cells were then labeled with PE conjugated anti-DKDDDDK-Flag antibody (BioLegend) followed by dead cell discrimination using 7-amino-actinomycin D (7-AAD). The live, tetramer positive cells were sorted using a Sony MA900 Sorter (Sony). Tetramer positive cells were counted by Nexcelom Cellometer (Lawrence, MA, USA) using AOPI stain following manufacturer's recommended conditions. Single-cell encapsulations were generated utilizing 5' v1 Gem beads from 10x Genomics (Pleasanton, CA, USA) on a 10x Chromium controller and downstream TCR, and Surface marker libraries were made following manufacturer recommended conditions. All libraries were quantified on a BioRad CFX 384 (Hercules, CA, USA) using Kapa Biosystems (Wilmington, MA, USA) library quantified kits and pooled at an equimolar ratio. TCRs, surface markers, and tetramer generated libraries were sequenced on Illumina (San Diego, CA, USA) NextSeq550 instruments. HLAthena, a prediction tool trained on endogenous LC-MS/MS-identified epitope data, was used to predict HLA class I presentation for all unique 8-11mer SARS-Cov-2 peptides across 31 HLA-A, 40 HLA-B and 21 HLA-C alleles (Sarkizova et al., 2020) . World frequencies of HLA-A, -B, and -C allele in Table S6B are based on a meta-analysis of high-resolution HLA allele frequency data describing 497 population samples representing approximately 66,800 individuals from throughout the world (Solberg et al., 2008) (Poran et al., 2020) . The cumulative phenotypic frequency (CPF) of peptides was calculated using = 1 − (1 − ∑ ) , assuming Hardy-Weinberg proportions for the HLA genotypes (Dawson et al., 2001) , where p i is the population frequency of the i th alleles within a subset of HLA-A, -B, or C alleles, denoted C. Coverage across HLA-A, -B, and -C alleles was calculated similarly: , where A, B, and C denote a subset of HLA-A, -B, and/or -C alleles for which the coverage is computed, as recently done in (Poran et al., 2020) . Postfiltering, intensity-based absolute quantification (iBAQ) was performed on the whole proteome LC-MS/MS as described in (Schwanhäusser et al., 2011) . Briefly, iBAQ values were calculated as follows: log10(totalIntensity/numObservableTrypticPeptides), the total precursor ion intensity for each protein was calculated in Spectrum Mill as the sum of the precursor ion chromatographic peak areas (in MS1 spectra) for each precursor ion with a peptide spectrum match (MS/MS spectrum) to the protein, and the numObservableTrypticPeptides for each protein was calculated using the Spectrum Mill Protein Database utility as the number of tryptic peptides with length 8 -40 amino acids, with no missed cleavages allowed. Of note, S coverage was 55% in the HEK293T and 44% in the A549 post 24 hour fractionated data, which may be due to the high levels of glycosylation. Lower peptide coverage may lead to underestimation of S protein in our data. Both log10 transformed total intensity and iBAQ values were median normalized by subtracting sample specific medians and adding global medians for each abundance metrics and reported in Table S4 . Sequencing reads were mapped to SARS-CoV-2 genome (RefSeq NC_045512.2) and human transcriptome (Gencode v32). Alignment was performed using Bowtie version 1.2.2 (Langmead et al., 2009 ) with a maximum of two mismatches per read. The fraction of human and viral reads J o u r n a l P r e -p r o o f 38 was determined based on the total number of reads aligned to either SARS-CoV-2 or human transcripts. Tetramer data analysis was performed using Python 3.7.3. For each single-cell encapsulation, tetramer UMI counts (columns) were matrixed by cell (rows) and log-transformed. The matrix was then Z-score transformed row-wise and subsequently, median-centered by column. Means were calculated by clonotype, and those with a value greater than 4 were characterized as positive interactions. Hydrogel-based RNA-seq data were analyzed using the Cell Ranger package from 10X Genomics (v3.1.0) with the GRCh38 human expression reference (v3.0.0). Except where noted, Scanpy (v1.6.0, (Wolf et al., 2018) ) was used to perform the subsequent single-cell analyses. Any exogenous control cells identified by TCR clonotype were removed before further gene expression processing. Hydrogels that contain UMIs for less than 300 genes were excluded. Genes that were detected in less than 3 cells were also excluded from further analysis. The following additional quality control thresholds were also enforced. To remove data generated from cells likely to be damaged, upper thresholds were set for percent UMIs arising from mitochondrial genes (13%). To exclude data likely arising from multiple cells captured in a single drop, upper thresholds were set for total UMI counts based on individual distributions from each encapsulation (from 1500 to 3000 UMIs). A lower threshold of 10% was set for UMIs arising from ribosomal protein genes. Finally, an upper threshold of 5% of UMIs was set for the MALAT1 gene. Any hydrogel outside of any of the thresholds was omitted from further analysis. A total of 15,683 hydrogels were carried forward. Gene expression data were normalized to counts per 10,000 UMIs per cell (CP10K) followed by log1p transformation: ln(CP10K + 1). Highly variable genes were identified (1,567) and scaled to have a mean of zero and unit variance. They were then provided to scanorama (v1.7, (Hie et al., 2019) ) to perform batch integration and dimension reduction. These data were used to generate the nearest neighbor graph which was in turn used to generate a UMAP representation that was used for Leiden clustering. The hydrogel data (not scaled to mean zero, unit variance, and before extraction of highly variable genes) were labeled with cluster membership and provided to SingleR ( MonacoImmuneData, DatabaseImmuneCellExpressionData, and BlueprintEncodeData. SingleR was used to annotate the clusters with their best-fit match from the cell types in the references. Clusters that yielded cell types other than types of the T Cell lineage were removed from consideration and the process was repeated starting from the batch integration step. The best-fit annotations from SingleR after the second round of clustering and annotation were assigned as putative labels for each Leiden cluster. In order to provide corroboration for the SingleR best-fit annotations and further evidence as to the phenotype of the clusters, gene panels representing functional categories (Naïve, Effector, Memory, Exhaustion, Proliferation) were used to score each hydrogel's expression profiles using scanpy's "score_genes" function (Wolf et al., 2018) Table S3 . SARS-CoV-2 protein abundance and presentability (Related to Fig. 2) (A) SARS-CoV-2 protein expression values as determined by whole proteome measurements in A549 and HEK293T cells and Ribo-seq translation measurements that were previously published for Vero cells (Finkel et al., 2020b) . iBAQ -intensity-based absolute quantification. (B) HLA-I presentability estimates of SARS-CoV-2 ORFs based on HLAthena predictions. Data associated with whole proteome analyses of A549 and HEK293T cells +/-SARS-CoV-2 at 0, 3, 6, and 24hpi. The "gene_anno" sheet contains the protein groups used to plot the results from the two-sample t-test results ( Figure 4E , F). These annotations are a combination of manual curation, KEGG, and GO annotations of genes from proteasomal pathway, antigen processing, ubiquitination pathway, IFN signaling, and SARS-CoV-2 genes. The "all proteins" sheet contains t-test results from uninfected vs. 3,6, and 24 hpi. The "sig01_sortFC" tab contains all proteins that are statistically enriched or depleted in response to SARS-CoV-2 infection in HEK293T and A549 cells. The sheet"Input_WholeProteome_data" contains the whole proteome data from the Spectrum Mill database search from uninfected, 3,6, and 24hpi J o u r n a l P r e -p r o o f 41 fractionated whole proteome samples. The sheet "PXD020019_data" represents the MaxQuant LFQ results reported by this study used for comparison in Fig. 4F and Fig. S3E . HLA-I presented peptides identified in our study were compared against epitopes tested for CD8+ responses in Tarke et al. (Tarke et al., 2020) in patient samples with known HLA restriction. We looked for exact peptide matched (left table) and partial peptide matches (defined as our peptide with up to two N-or C-terminal residues missing being a subsequence within a Tarke Flowthrough Mass Spectrometry Profiling of HLA-Associated Peptidomes in Mono-allelic Cells Enables More Accurate Epitope Prediction MHC-Peptide Tetramers to Visualize Antigen-Specific T Cells SARS-CoV-2 T cell immunity: Specificity, function, durability, and role in protection Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage Unsupervised HLA Peptidome Deconvolution Improves Ligand Prediction Accuracy and Predicts Cooperative Effects in Peptide-HLA Interactions The race for coronavirus vaccines: a graphical guide Prediction of SARS-CoV-2 epitopes across 9360 HLA class I alleles SARS-CoV-2 desensitizes host cells to interferon through inhibition of the JAK-STAT pathway. bioRxiv Pervasive functional translation of noncanonical human open reading frames Resistance of SARS-CoV-2 variants to neutralization by monoclonal and serum-derived polyclonal antibodies Relationship between the inhibition constant (K1) and the concentration of inhibitor which causes 50 per cent inhibition (I50) of an enzymatic reaction High-throughput and Sensitive Immunopeptidomics Platform Reveals Profound Interferonγ-Mediated Remodeling of the Human Leukocyte Antigen (HLA) Ligandome Kinetics of antigen expression and epitope presentation during virus infection Immunological memory to SARS-CoV-2 assessed for up to eight months after infection Ramifications of HLA class I polymorphism and population genetics for vaccine development Pre-fractionation Extends but also Creates a Bias in the Detectable HLA Class Ι Ligandome Recovering Gene Interactions from Single-Cell Data Using Data Diffusion SARS-CoV-2 ORF9c Is a Membrane-Associated Protein that The Nucleocapsid Protein of SARS-CoV-2: a Target for Vaccine Development Improved Ribo-seq enables identification of cryptic translation events Unbiased Screens Show CD8+ T Cells of COVID-19 Patients Recognize Shared Epitopes in SARS-CoV-2 that Largely Reside outside the Spike Protein Comprehensive annotations of human herpesvirus 6A and 6B genomes reveal novel and conserved genomic features The coding capacity of SARS-CoV-2. bioRxiv SARS-CoV-2 utilizes a multipronged strategy to suppress host protein synthesis Allelic variation in Class I HLA determines preexisting memory responses to SARS-CoV-2 MGH COVID-19 Collection & Processing Team, and Maus, M.V. (2021). SARS -CoV-2 T-cell immunity to variants of concern following vaccination A SARS-CoV-2 protein interaction map reveals targets for drug repurposing Targets of T Cell Responses to SARS-CoV-2 Coronavirus in Humans with COVID-19 Disease and Unexposed Individuals A Sequence Homology and Bioinformatic Approach Can Predict Candidate Targets for Immune Responses to SARS-CoV-2 Two complementary methods for predicting peptides binding major histocompatibility complex molecules Suboptimal SARS-CoV-2-specific CD8+ T cell response associated with the prominent HLA-A* 02: 01 phenotype MHC class I antigen presentation: learning from viral evasion strategies Influenza A Virus Negative Strand RNA Is Translated for CD8+ T Cell Immunosurveillance Efficient integration of heterogeneous single-cell transcriptomes using Scanorama Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes An mRNA vaccine against SARS-CoV-2-preliminary report Conflicting and ambiguous names of overlapping ORFs in the SARS-CoV-2 genome: A homology-based resolution SARS-CoV-2-specific CD8+ T cell responses in convalescent COVID-19 individuals Physical detection of influenza A epitopes identifies a stealth subset on human lung epithelium evading natural CD8 immunity On programmed ribosomal frameshifting: the alternative proteomes The Architecture of SARS-CoV-2 Transcriptome Ultrafast and memory-efficient alignment of short DNA sequences to the human genome SARS-CoV-2-specific T cell immunity in cases of COVID-19 and SARS, and uninfected controls How "killer" T cells could boost COVID immunity in face of new variants Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding Interactions Between KIR3DS1 and HLA-F Activate Natural Killer Cells to Control HCV Replication in Cell Culture CD8+ T cell recognition of cryptic epitopes is a ubiquitous feature of AIDS virus infection Epitope discovery in West Nile virus infection: Identification and immune recognition of viral epitopes Antigen-specific adaptive immunity to SARS-CoV-2 in acute COVID-19 and associations with age and disease severity Phase 1/2 study of COVID-19 RNA vaccine BNT162b1 in adults 2011MHC class I and MHC class II antigen presentation Human leukocyte antigen susceptibility map for SARS-CoV-2 Thousands of novel unannotated proteins expand the MHC I immunopeptidome in cancer Immune evasion via SARS-CoV-2 ORF8 protein? Sequence-based prediction of SARS-CoV-2 vaccine targets using a mass spectrometry-based bioinformatics predictor identifies immunogenic T cell epitopes A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology CD8+ T cell responses in COVID-19 convalescent individuals target conserved epitopes from multiple prominent SARS-CoV-2 circulating variants Analysis of Major Histocompatibility Complex-Bound HIV Peptides Identified from Various Cell Types Reveals Common Nested Peptides and Novel T Cell Responses Most non-canonical proteins uniquely populate the proteome or immunopeptidome SARS-CoV-2 genome-wide mapping of CD8 T cell recognition reveals strong immunodominance and substantial CD8 T cell activation in COVID-19 patients A large peptidome dataset improves HLA class I epitope prediction across most of the human population Measles Virus Epitope Presentation by HLA: Novel Insights into Epitope Selection, Dominance, and Microvariation The SARS-CoV-2 RNA-protein interactome in infected human cells SARS-CoV-2 Nsp1 binds the ribosomal mRNA channel to inhibit translation Global quantification of mammalian gene expression control Robust T cell immunity in convalescent individuals with asymptomatic or mild COVID-19 Grigory A. Efimov SARS-CoV-2 Epitopes Are Recognized by a Public and Diverse Repertoire of Human T Cell Receptors Measurement of MHC/peptide interactions by gel filtration or monoclonal antibody capture Balancing selection and heterogeneity across the classical human leukocyte antigen loci: a meta-analytic review of 497 population studies Regulation of translation initiation in eukaryotes: mechanisms and biological targets Nowhere to hide: unconventional translation yields cryptic peptides for immune surveillance Decoding human cytomegalovirus Multi-level proteomics reveals host-perturbation strategies of SARS-CoV-2 and SARS-CoV Multi-Omics Resolves a Sharp Disease-State Shift between Mild and Moderate COVID-19 Identification of HLA-A*02:01-Restricted Candidate Epitopes Derived from the Nonstructural Polyprotein 1a of SARS-CoV-2 That May Be Natural Targets of CD8 T Cell Recognition In Vivo Comprehensive analysis of T cell immunodominance and immunoprevalence of SARS-CoV-2 epitopes in COVID-19 cases Negligible impact of SARS-CoV-2 variants on CD4+ and CD8+ T cell reactivity in COVID-19 exposed donors and vaccinees Defining the HLA class I-associated viral antigen repertoire from HIV-1-infected human cells Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS The Perseus computational platform for comprehensive analysis of (prote)omics data Phenotype and kinetics of SARS-CoV-2-specific T cells in COVID-19 patients with acute respiratory distress syndrome SCANPY: large-scale single-cell gene expression data analysis A new coronavirus associated with human respiratory disease in China Serum Neutralizing Activity Elicited by mRNA-1273 Vaccine -Preliminary Report Quantification of epitope abundance reveals the effect of direct and cross-presentation on influenza CTL responses Defining Viral Defective Ribosomal Products: Standard and Alternative Translation Initiation Events Generate a Common Peptide from Influenza A Virus M2 and M1 mRNAs The ORF8 Protein of SARS-CoV-2 Mediates Immune Evasion through Potently Downregulating MHC-I Time course analysis of HLA-I immunopeptidome in SARS-CoV-2-infected cells • 25% of detected HLA-I peptides originated from out-of-frame ORFs in S and N • Some out-of-frame peptides elicited stronger T cell responses than canonical peptides • Early expressed viral proteins dominated HLA-I presentation and immunogenicity Analysis of the HLA-1 peptidome of SARS-CoV-2 infection identifies peptides derived from canonical and outof-frame ORFs in viral S and N protein that are not captured by current vaccines and yield potent T cell