key: cord-103837-iuvigqdx authors: Knierman, Michael D.; Lannan, Megan B.; Spindler, Laura J.; McMillian, Carl L.; Konrad, Robert J.; Siegel, Robert W. title: The Human Leukocyte Antigen Class II Immunopeptidome of SARS-CoV-2 Spike Glycoprotein date: 2020-11-13 journal: nan DOI: 10.1016/j.celrep.2020.108454 sha: doc_id: 103837 cord_uid: iuvigqdx The precise elucidation of the antigen sequences for T-cell immunosurveillance greatly enhances our ability to both understand and modulate humoral responses to viral infection or active immunization. Mass spectrometry is used to identify 526 unique sequences from SARS-CoV-2 spike glycoprotein extracellular domain in a complex with human leukocyte antigen class II molecules on antigen presenting cells from a panel of healthy donors selected to represent a majority of allele usage from this highly polymorphic molecule. The identified sequences span the entire spike protein and several sequences are isolated from a majority of the donors sampled indicating promiscuous binding. Importantly, many peptides derived from the receptor binding domain used for cell entry are identified. This work represents a precise and comprehensive immunopeptidomic investigation with SARS-CoV-2 spike glycoprotein and allows detailed analysis of features which may aid vaccine development to end the current COVID-19 pandemic. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a positive-sense single-strand RNA virus that is a novel member of the genus Betacoronavirus family Coronaviridae responsible for the coronavirus disease 19 that emerged in China late 2019 and became a global pandemic by March 2020. As of date of manuscript preparation, over 34,000,000 cases with more than 1,000,000 fatalities have been reported worldwide (https://coronavirus.jhu.edu/). There are four additional members of the Betacoronavius genus; two (HCoV-HKU1 and HCoV-OC43) that cause mild respiratory symptoms associated with the common cold and two (SARS-CoV and Middle East respiratory syndrome-CoV) that can cause fatal respiratory tract infections. The SARS-CoV-2 genomic sequence shares 79.6% identity with SARS-CoV and 96.2% identity with SARSr-CoVRatG13 isolated from bats supporting a zoonotic origin (Zhou et al., 2020a) . At present, no therapeutic interventions to treat or prevent COVID-19 have been approved for use (https://milkeninstitute.org/covid-19-tracker). Active immunization is an area of intense research with over 100 programs under development and the successful implementation would greatly aid in ending the current pandemic (https://www.who.int/who-documents-detail/draft-landscape-of-covid-19-candidate-vaccines). Immunization strategies include the use of live-attenuated virus, inactivated virus, non-replicating viral vector, protein subunit, and nucleic acid-based approaches. At the time of manuscript preparation, at least 43 candidate vaccines have progressed into clinical evaluation. An effective immunization would prevent viral entry into cells. The surface spike glycoprotein is the defining feature of all coronaviruses and is critical for internalization by engaging the host receptor and mediation of virus-host membrane fusion (Cavanagh, 1995) . The SARS-CoV-2 spike protein, as with SARS-CoV, interacts with human angiotensin-converting enzyme 2 (ACE2) (Zhou et al., 2020b) , (Walls et al., 2020) , (Li et al., 2003) . The transmembrane glycoprotein is comprised of two functional subunits and forms homotrimers on the viral cell surface. A defined receptor binding domain (RBD) within the S1 subunit is responsible for binding to ACE2 , (Li et al., 2005) . The S2 subunit in all coronaviruses contains 2 heptad repeat segments that form a coiled-coil structure and membrane fusion occurs after proteolytic processing and conformational rearrangement (Liu et al., 2004) , (Millet and Whittaker, 2015) . A 4 amino acid polybasic furin cleavage site insertion between the S1 and S2 subunits is a unique feature within the SARS-CoV-2 spike protein (Coutard et al., 2020) , (Walls et al., 2020) . Efficacious SARS-CoV-2 vaccine development is dependent on a robust humoral immune response targeting the spike glycoprotein, and J o u r n a l P r e -p r o o f perhaps more specifically the RBD assuming similar results as was observed for SARS-CoV (Buchholz et al., 2004) . A central feature of the adaptive immune response is the presentation of immunogenic peptides for immunosurveillance by CD4+ helper T-cells. Computational prediction of T-cell epitope candidates has been applied for vaccine discovery and removal of unwanted immune responses against protein therapeutics (Griswold and Bailey-Kellogg, 2016) . Current knowledge of peptide binding motifs is based primarily on data generated using biochemical binding assays (Justesen et al., 2009) , (Sidney et al., 2013) which are compiled in the Immune Epitope Database (IEDB) (Vita et al., 2019) . This information is used to train prediction algorithms, such as Tepitool resource in IEDB (Paul et al., 2016) and NetMHCIIpan4 (Reynisson et al., 2020) . Recently this approach was applied to identify known epitopes from multiple coronaviruses and predict likely B-and T-cell epitopes from several SARS-CoV-2 proteins and reactive CD4+ T-cells obtained from COVID-19 patients were observed when stimulated using a "megapool" comprised of overlapping synthetic 15-mer peptides spanning the entire spike protein sequence (Grifoni et al., 2020b) . However, the precise sequences from the SARS-CoV-2 spike glycoprotein for CD4+ T-cell activation is lacking, and a clear understanding will benefit vaccine development. Mass spectrometry-based approaches, often termed immunopeptidomics, have been developed to examine the repertoires of peptides presented by human leukocyte antigen (HLA) molecules of the class I or class II major histocompatibility complex (MHC) used for immunosurveillance by CD8+ or CD4+ Tcells, respectively (Hunt et al., 1992a) (Hunt et al., 1992b) . HLA class II molecules are restricted to antigen presenting cells (APCs) and play an essential role in the development of a humoral adaptive immune response via activation of CD4+ T-helper cells. The HLA-II peptide repertoire is a product of extracellular proteins that are proteolytically processed in the lysosomal compartment after internalization. HLA-II immunopeptidomics has been applied to understand the CD4+ T-cell epitopes for potential vaccine design from pathogens such as Mycobacterium tuberculosis (Bettencourt et al., 2020) , vaccinia virus (Strug et al., 2008) , (Lorente et al., 2019) , measles virus (Ovsyannikova et al., 2003) and human herpes virus 6B (Becerra-Artiles et al., 2019). These approaches have been typically limited in scope, restricted to one or two cell lines, and thus sampling only a very limited subset of HLA-DR alleles. MHC-associated peptide proteomics (MAPPs) is a specific extension of HLA-II immunopeptidomics that incorporates the intentional pulsing of dendritic cells (DCs) with an antigen or protein of interest (Rohn et al., 2005) . More recently, HLA-II MAPPs has been implemented to investigate and understand the mechanisms of treatment-emergent immunogenicity for biotherapeutic proteins (Cassotta et al., 2019) , J o u r n a l P r e -p r o o f (Hamze et al., 2017) , (Jankowski et al., 2019) , (Sekiguchi et al., 2018) as this approach allows for the facile interrogation of the immunogenic potential from multiple HLA class II alleles. In the present study we sought to identify the naturally processed and presented immunopeptidome of the SARS-CoV-2 spike glycoprotein from human antigen presenting cells. DCs from a panel of healthy human subjects representing a large percentage of the HLA-DRB1 allele usage within the United States were treated with recombinant spike glycoprotein extracellular domain (ECD). A subset of donors was also selected to represent common alleles from the Asia-Pacific geography. HLA-II associated peptides were identified by liquid chromatography and nanoelectrospray ionization tandem mass spectrometry after immunoprecipitation. We observed several clusters, or nested sets of peptides, derived from every domain of the SARS-CoV-2 spike glycoprotein. We determined the prevalence of these clusters among multiple donors. Finally, we sought to compare our observed HLA-II epitopes with those from recent a in silico prediction (Grifoni et al., 2020a) and determine regions that are conserved with SARS-CoV spike protein sequence. The MAPPs method intentionally pulses human DCs from a panel of donors with a protein of interest. The 4 male and 5 female healthy donors, ranging in age from 21 to 57 years old, used in this study were selected to sample approximately 53% and 46% of the HLA-II DRB1 allele frequency from the United States and Asian-Pacific geographic regions, respectively (Table 1) . Full HLA typing of the donors is also available (Supplemental Table S1 ). The donors' PBMCs were collected and stored frozen before the outbreak of the SARS-CoV-2 pandemic and are expected to be unexposed to potential infection. The sequence used for this work was derived from the SARS-CoV-2 Wuhan-Hu-1 strain spike glycoprotein ECD spanning residues 1-1213 taken from GenBank accession YP_009724390. An R685A mutation was used to prevent cleavage during recombinant protein production with affinity tags added for purification. The recombinant protein was produced from mammalian cells and is expected to have Nlinked glycosylation modifications. The method used to profile the HLA-II peptides is outlined in Figure 1A . Monocyte-derived DCs were generated in culture with a cytokine cocktail. The immature DCs were treated with SARS-CoV-2 spike glycoprotein ECD and after 24 hours, lipopolysaccharide (LPS) was added to mature the DCs. The treated cells were lysed and the HLA class II complex was isolated by immunoprecipitation with a pan-HLA class II antibody. The bound HLA-II peptides were eluted after acidification, filtered to remove high molecular weight co-precipitants, and analyzed by capillary HPLC on an orbitrap mass spectrometer. Peptide ions were fragmented using multiple fragmentation techniques. Peptides were identified using multiple proteomic search engines with a forward and reverse database search. False discovery rate estimates (q-values) were estimated using a null distribution from the reverse database search. The peptides identified from the spike glycoprotein had q-values below 0.07 and all fragmentation spectra matching the spike protein were manually reviewed (see Methods for details). A total of 876 HLA-II peptides from the SARS-CoV-2 spike glycoprotein were identified from the donor panel (Supplemental Table S2 ). Several peptides with identical sequences were identified from multiple donors. Removal of these duplicate peptides resulted in 526 unique sequences. In order to minimize both false positive and negative peptide identifications, the database used for peptide identification was intentionally constructed to contain the spike protein along with approximately 2000 background bovine and human proteins previously observed from multiple samples analyzed with this assay system. We also analyzed the data using multiple search engines against a database containing the entire human proteome (downloaded 05Apr19) that also included the SARS-CoV-2 spike glycoprotein and did not find any human protein identifications for the 526 unique SARS-CoV-2 spike peptides. The primary mass spectrometry data is publicly available for individual analysis (doi:10.25345/C51M7P). The unique peptide sequences had a distribution of lengths consistent with HLA-II peptides with a mean length of 15 residues ( Figure 1B ) (Kampstra et al., 2019) . There were 169 unique peptides that had modified residues. The modifications observed were consistent with HLA-II peptide processing. The spike protein is heavily glycosylated with 22 putative N-glycosylation sites, presumably to help evade immune detection, and the majority of the regions from the spike ECD not observed were centered around these sites. Current limitations of our method prevent detection of glycosylated peptides if, in fact, they were processed and loaded onto HLA class II molecules. Interestingly, HLA-II peptides from 3 putative N-linked glycosylation sites were observed which indicates these regions were not modified. We did identify a deamidation modification with the asparagine residue at residue 1098 with our search engines (Supplemental Table S2 ). This modification is consistent with the conversion of asparagine to J o u r n a l P r e -p r o o f aspartic acid after removal of N-linked glycosylation and is indirect evidence of glycosylation of this residue in the intact protein (Khoshnoodi et al., 2007) . Every donor examined produced HLA-II peptides derived from the SARS-CoV-2 spike glycoprotein ( Figure 1C ). The number of spike peptides observed per donor ranged from 9 to 203 with a median value of 91 peptides. One donor (40146) did not efficiently produce mature DCs and resulted in a low overall number of peptides relative to all other donors analyzed. Repeated analysis with this donor also yielded poor results (data not shown). HLA-II peptides are distributed across entire SARS-CoV-2 spike protein extracellular domain and have consensus HLA-II clusters HLA-II peptides derived from the SARS-CoV-2 spike glycoprotein were aligned to the ECD sequence and a peptide density map was generated to help visualize both the breadth and depth of sequence coverage. A schematic listing of the various subunits of the spike glycoprotein ECD and the results obtained with each donor are shown on individual rows ( Figure 2 ). The shades of red correspond to the number of overlapping peptides encompassing a given amino acid position. From this presentation of the data, it is evident that HLA-II peptides spanning all segments of the spike glycoprotein ECD were obtained from all donors sampled. The heatmap also reveals that HLA-II peptides from several regions of the spike glycoprotein were observed from multiple donors likely reflecting the more promiscuous HLA class II binding epitopes. An expanded view of the RBD encompassing residues 319-541 is shown in the bottom panel of Figure 2 and clearly demonstrates promiscuous epitopes are also contained in this region of the molecule critical for interaction with ACE2 cell receptor. Groups of nested peptides sharing a common core but with ragged N-and C-termini are generated from the multiple proteases and different temporal patterns of processing that occur in the lysosomal compartment (Lippolis et al., 2002) . In order to organize the observed HLA-II peptides from the spike protein into discrete segments for subsequent analysis, we used the IEDB Epitope Cluster Analysis Tool 1.0 with some manual adjustments as noted in STAR methods to group these into 73 distinct clusters (Supplemental Table S3 ). Clusters were characterized from both full span and minimal overlap perspectives. The full cluster sequence represents the first start position of the peptides in the cluster to the last position of the peptides in the cluster. The minimum cluster sequence is the smallest common sequence among the peptides in the cluster. Clusters with minimum cluster sequences less than 9 residues likely contain 2 overlapping binding cores, which due to their proximity, were unable to be easily separated. As expected, we observed a distribution of the clusters among the donors ( Figure 3 ). Most of the clusters were observed from 4 or fewer donors and were designated as restricted; whereas, 11 clusters were observed from 5 -7 donors and were designated as consensus ( Figure 3A and Table 2 ). This arbitrary definition of a consensus cluster was set to reflect those sequences observed in at least 50% of the donors sampled in the study. Consensus clusters represent those SARS-CoV-2 spike glycoprotein sequences with the most promiscuous, but not necessarily highest affinity, binding to the broadest range of HLA class II alleles. No specific cluster was observed in all donors sampled in this study. The median number of clusters per donor was 18 and all donors displayed at least one of the consensus clusters ( Figure 3B ). We next examined the distribution in the number of peptides within both restricted and consensus clusters ( Figure 3C ). The vast majority of clusters contained less than 10 peptides with most, but not all, of the consensus clusters having 11 or greater members. Interestingly, consensus cluster 1, observed in 5 donors, was comprised of a single peptide sequence in the S1 subunit (Table 2) . Finally, we examined the location of the clusters within discrete segments of the SARS-CoV-2 spike glycoprotein ( Figure 3D ). The clusters were distributed throughout the protein. All regions contained at least one HLA-II peptide cluster with consensus clusters occurring in several different regions. Of note, the RBD region responsible for binding ACE2 and a target for vaccination strategies contained a total of 16 clusters in which 3 were consensus (Table 3) . Observed SARS-CoV-2 spike protein HLA-II peptides have limited overlap with predicted CD4+ T-cell epitopes CD4+ T-cell epitopes from the spike glycoprotein derived from an algorithm designed to predict the dominant HLA class II peptides was recently published (Grifoni et al., 2020a) . We sought to compare those predictions with the results from our study. We used the minimum observed cluster sequence for our comparison and considered an overlap of at least 9 residues within the predicted 15 residue peptide sequence as a match with one exception as denoted below. We chose this minimum overlap length based on the number of residues contained within HLA class II peptide binding cleft. We chose to use the minimum cluster sequence in an attempt to reduce matches on the extreme peripheries of the observed peptides that likely do not reflect the likely binding core contained within the cluster. A perfect congruence between prediction and observed results from this donor set would result in 19 matches as one of the predicted peptides resides in the transmembrane portion which was absent from the protein used in this study. In contrast, we observed HLA-II peptide clusters from our donor set that J o u r n a l P r e -p r o o f matched a total of 9 predicted epitopes ( Figure 4A and Supplemental Table S4 ). Cluster 27 was deemed to match a predicted epitope located from residues 451 -465 (YLYRLFRKSNLKPFE) in the RBD domain even though only a 5-residue overlap was observed using the criteria defined above. This particular cluster was composed of 37 unique sequences, the most out of all the clusters, and likely contains two closely overlapping allele binding sites. Since 25 of the observed peptides from this cluster matched the first 9 residues of the predicted peptide, we included this region as overlapping with the prediction (Supplemental Table S2 for cluster details). We also note two instances in which multiple clusters were deemed to sufficiently overlap with a predicted peptide which skews the Venn diagram in Figure 4A to show 71 instead of the 73 total observed clusters. Comparison of the observed consensus clusters with the predicted peptides reveals that only 2 of the 11 observed promiscuous HLA-II binding regions were predicted (Table 2 ). This finding is intriguing given the prevalence of display of these particular sequences and may reflect limitations in the alleles used to generate the predictions and/or the prediction algorithm. The RBD of the SARS-CoV-2 spike glycoprotein contained 16 observed HLA-II clusters and 4 predicted peptides. We observed 3 of the 4 predicted epitopes (Table 3 and Supplemental Table S3 ). The only predicted peptide that was not observed contained a putative N-linked glycosylation site. As denoted above, our MAPPs method is unlikely to identify a peptide with this modification. In fact, 6 of the 20 predicted peptides contain a putative Nlinked glycosylation site (Supplemental Table S4 ) possibly reflecting the fact that current predictive algorithms do not consider post-translational modifications. In addition, the glycosylated asparagine residue in many of the predicted peptides is centrally located rather than on the periphery and large complex glycan structures have been shown to interfere with HLA or T-cell receptor binding (Speir et al., 1999) , (Kario et al., 2008) , (Malaker et al., 2017) . Also striking is the overall number of observed clusters (62), many obtained from multiple donors, which were not predicted. It is worth noting that the algorithm used to predict SARS-COV-2 spike protein epitopes is largely trained on HLA binding affinity data which do not reflect the authentic processing captured by MAPPs assay. Further expansion of MAPPs-derived data into these training sets will likely be beneficial. The list of unique peptides for each HLA class II cluster was compared with the spike glycoprotein sequence from SARS-CoV protein (UniProt accession P59594). The non-identical residues between the two sequences are denoted in red under the spike glycoprotein schematic in Figure 4B . The S2 and S2' regions had larger areas of identity between the sequences. The 526 unique HLA-II peptides identified in this study were analyzed for sequence identity with the SARS-CoV spike glycoprotein. No identical matches were identified in the S1 subunit. However, 14 clusters match the SARS-CoV sequence in the S2 subunit. Interestingly, one of the matches, cluster 49, was a consensus cluster (denoted with blue Figure 4B ). These clusters represent sequence regions from both viruses that are potentially presented by HLA class II molecules for T-cell surveillance. The spike proteins from the other coronaviruses known to infect humans (NL63, 229E, HKU1, OC43, and MERS) were also evaluated and no matches were found. To look for potential cross-reactivity to human proteins, each unique identified peptide from the SARS-CoV-2 spike glycoprotein was searched against the UniProt human database for an exact match to any human protein. None of the observed peptides had a sequence match (data not shown). Finding no matches, we expanded our search to include up to 2 mismatched amino acids without insertions or deletions. A single peptide could be associated with 13 human proteins if 2 residues of non-identity are allowed. These results indicate that the risk from direct sequence cross-reactivity is minimal and any portion of the SARS-CoV-2 spike glycoprotein associated with HLA class II molecule is unlikely to be subject to previous tolerization in a vaccinated subject or infected patient. However, we do acknowledge that our analysis does not consider cross-reactivity when strictly limited to putative T cell contact residues given the difficulty to reliably predict such registers. CD4+ T-cell participation is vital for a robust humoral response to viral infection or active immunization. A clear delineation of the epitopes presented by APCs for T-cell immunosurveillance greatly enhances our understanding of this process. Generally, T-cell lines or PBMCs from recovered patients using peptides derived simply by spanning the entire protein(s) of interest or from HLA-II prediction algorithms have been utilized (Meunier et al., 2019) , (Grifoni et al., 2020b) , (Schulze zur Wiesch et al., 2005) . The ability to automate and miniaturize the MAPPs assay enables facile identification of 1000's of naturally processed and displayed HLA-II peptides from human DCs. Using this approach, we were able J o u r n a l P r e -p r o o f to determine the precise regions and sequences of peptides from SARS-CoV-2 spike glycoprotein ECD derived from a panel of healthy subjects presented for immune surveillance by T-cells. The 9 subjects used in this study enabled sampling of approximately 53% and 46% of the HLA-DRB1 allele frequency from the United States and Asian-Pacific geographic regions, respectively (Table 1) . This work represents, to our knowledge, the most precise and comprehensive immunopeptidomic investigation with SARS-CoV-2 spike glycoprotein performed to date and allows detailed analysis of features which may aid vaccine development. We observed a total of 526 unique peptide sequences contained within 73 clusters distributed across each segment of the SARS-CoV-2 spike glycoprotein ECD presented by human DCs (Figure 2 and Supplemental Table S2 ). Two of the clusters were in regions that deviated from the reference sequence. One region was the S1/S2 cleavage site in which the novel furin site was eliminated with mutation to enable production of full-length recombinant protein. We speculate that this region of the spike protein containing the native residue could also be presented from those molecules that are not cleaved during virion particle assembly. The other area of deviation was the C-terminal affinity tags used for purification. Of particular interest are those peptides from the spike glycoprotein that are presented by multiple donors as these would be sequences likely to elicit a T-cell response from the greatest number of patients or vaccinated subjects. We observed 11 consensus clusters, defined as being present in 5 or more of the 9 donors analyzed in this study, including 3 within the RBD which is essential for binding to ACE2 on host cells (Table 3) . A majority of the consensus clusters contained 11 or more nested peptides. In the absence of a dedicated assay to quantify the presented HLA-II peptides, we use this metric as a surrogate for peptide abundance but recognize that even a single specific peptide sequence can be presented in sufficient number to elicit a T-cell response. Recent reports leveraging either bioinformatics to predict (Grifoni et al., 2020a) or a single-pot peptide pools composed of >150 overlapping peptides spanning the entire open reading frame (Grifoni et al., 2020b) , (Braun et al., 2020) have been published attempting to elucidate SARS-CoV-2 T-cell epitopes. Unfortunately, the latter approach does not allow any insights into the precise sequences capable of eliciting a response. Comparisons of the clusters we observed being presented by APCs reflecting the natural processing HLA-II loading processes with the predicted epitopes is illuminating. One of the predicted epitopes resides in the transmembrane domain which was absent from the protein used for our analysis and omitted from further discussion. Of the remaining predictions, roughly 50% (9 of 19) were observed from our panel of donors selected to represent a sizeable percentage of HLA-DR allele J o u r n a l P r e -p r o o f usage from multiple geographies (Figure 4 and Supplemental Table S3 ). Correspondingly, the vast majority of the observed 73 HLA-II clusters were not predicted. Of particular interest are the consensus clusters that were observed in 5 or more of the donors and would be expected to represent those sequences with the most promiscuous HLA class II binding. Only 2 of these 11 consensus clusters were predicted with only 1 of 3 consensus clusters contained in the RBD predicted. Of note and expanded in more detail below, consensus cluster 49 in the S2 subunit has 100% sequence identity to SARS-CoV, was predicted, and has been experimentally shown to be a T-cell epitope (Yang et al., 2009 ). The "7-allele method" HLA class II reference set used for generating the predicted epitopes is restricted to select DRB1/3/4/5 alleles (http://tools.iedb.org/mhcii/) (Paul et al., 2015) . The use of the pan-HLA class II antibody used in our study, which would enrich both HLA-DQ and HLA-DP bound peptides, could explain why some, but certainly not all, of the observed clusters were not predicted. However, given the ample evidence which shows the impact that both HLA-DQ and HLA-DP bound peptides have on T-cell activation for a variety of viral antigens (Koelle et al., 1997) , (Mellins et al., 1987) , (Koelle et al., 1994) , (Mellins et al., 1987) , (Lorente et al., 2019) , (Lorente et al., 2020) , we felt it was important to identify as many of those restricted peptides as possible. Nevertheless, this disconnect between promiscuously observed HLA class II clusters and predicted T-cell epitopes accentuates what has been highlighted before that the prediction of T-cell epitopes is an imperfect process that may not reflect what HLA class II molecules preferentially bind (Paul et al., 2015) , (Wantuch et al., 2020) ) and therefore, not the most effective approach for identifying CD4+ T-cell epitopes. (Wantuch et al., 2020) , (Strug et al., 2008) , (Ovsyannikova et al., 2003) ). MAPPs has been an instrumental method to preclinically assess the immunogenic potential of protein therapeutics (for review, (Quarmby et al., 2018) ). Multiple reports analyzing different therapeutic antibodies approved for clinical use have shown that many, but not all, HLA-II clusters identified from these molecules can elicit T-cell responses from both drug naïve donors, as well as, patients who developed treatment emergent anti-drug antibody responses (Walsh et al., 2020) , (Cassotta et al., 2019) , (Hamze et al., 2017) ). Also, unlike the immunogenicity assessment of a majority of protein biotherapeutics which are engineered to J o u r n a l P r e -p r o o f a great extent to be recognized as a self-protein and therefore generally present a very limited number of non-germline residues for scrutiny, each SARS-CoV-2 spike protein cluster is fundamentally distinct from any other sequence in the human genome and extremely unlikely to be subject to any tolerization. The list of unique SARS-CoV-2 spike glycoprotein peptides identified by MAPPs searched against the human database (UniProt version 2020_2) did not identify any significant matches. Nevertheless, the characterization of the activation potential of the SARS-CoV-2 spike protein clusters with T-cells from healthy donors and, ideally, convalescent patients should be evaluated. The source of DCs used were derived from healthy donors that, due to time of sample collection, had not been exposed to potential SARS-CoV-2 infection. Therefore, the displayed HLA-II clusters reported here could conceivably deviate from the repertoire obtained from infected individuals. This potential discrepancy would require that the internalization of the virus into APCs would fundamentally alter the proteolysis and/or HLA-II molecule loading mechanism. While examples of HLA-II display interference are known for other viruses that can infect and replicate in immune cells (Becerra-Artiles et al., 2019), the authors are not aware of that attribute with SARS-CoV-2 at this time and do not consider this to be a fundamental concern. This could be addressed by repeating this study using material obtained from convalescent patients or infection of DCs from healthy donors with live virus. Also, the DCs used in this study are derived from monocytes and could potentially have a different processing mechanism from plasmacytoid or follicular dendritic cell lineages. A detailed comparison in the MAPPs peptides obtained from different dendritic cell types is lacking given the difficulties in obtaining adequate numbers required for investigation. Nevertheless, multiple studies have shown monocyte-derived DCs to be as efficient as antigen-specific B-cells in presenting peptides for T-cell surveillance (Cella et al., 1997; Sallusto and Lanzavecchia, 1994) . Insights into the immunogenic potential of SARS-CoV-2 spike glycoprotein can be made from the results obtained in this study. Firstly, both the depth and breadth of the HLA-II peptides derived from this critical structural component for viral infectivity indicate that mutational drift as the pandemic continues to spread around the world is not expected to dramatically alter the ability of an infected individual to mount a new B-cell response to replace those antibodies which would be deleteriously impacted by such escape mutations. In the event that a mutation could result in an inability of a particular cluster to bind most HLA class II alleles, the sheer number of clusters distributed across the spike protein, especially the consensus clusters, make the mutation unlikely to enable the virus to escape CD4+ T-cell activation across a wide portion of the population. Secondly, SARS-CoV spike glycoprotein cross-reactive J o u r n a l P r e -p r o o f HLA-II restricted epitopes seem to be limited to the S2 domain as all 14 of the SARS-CoV-2 clusters identified in this study with complete sequence identity to SARS-CoV were derived from this region of the molecule (Figure 4b , Supplemental Table S3 ). This result is not surprising given the 90% sequence identity between the two viruses in S2 but only 60% identity in the S1 domain. No significant homology was noted for any of the other human coronavirus spike protein sequences. Interestingly, consensus cluster 49 (observed from 6 donors) was one of the 5 clusters from the S2 subunit with complete identity to SARS-CoV and has previously been identified as a T-cell epitope from healthy donors using a tetramer guided epitope mapping approach (Yang et al., 2009) . This epitope has significant, but not complete, overlap with the HLA-II derived cluster and indicates, as denoted above, that the approach of using overlapping peptides may reflect imperfect identifications from those that arise due to natural APC processing. Limitations of this study include that the identified peptides are restricted to those with suitable biophysical characteristics for ionization and compatibility with reverse phase liquid chromatography. The total number and wide breadth of coverage spanning the entire extracellular domain of the spike protein indicate that any false negative results obtained with the method outlined in this study are likely a small minority. Another limit is focusing the analysis to the spike glycoprotein to the exclusion of the other structural components of SARS-CoV-2, the membrane and nucleocapsid proteins. Undoubtedly many regions from both of those proteins will also be presented for T-cell surveillance; however, the focus of most immunization strategies seems to target the spike glycoprotein. Notwithstanding, the "adjuvant-like" potential of some of these presumed clusters to augment the humoral response to the spike glycoprotein cannot be accounted for with the current results and should be targets of future efforts. Additional effort to experimentally identify the actual HLA-II peptides presented from the spike glycoprotein, at a minimum, from all other coronaviruses would prove of interest. The ability to direct humoral immune response to discrete segment(s) of the SARS-CoV-2 spike glycoprotein that confer viral neutralization may potentially enable higher protective titers to be achieved with vaccination and limit antibody-dependent enhancement of infection as reported with other coronaviruses ((Tseng et al., 2012) , ). A preliminary report in which the 197 amino acid RBD of SARS-CoV-2 spike glycoprotein was used for immunization in rodents suggests that robust neutralizing response can be obtained without ADE (Quinlan et al., 2020) . The approach outlined and the results reported in this study can also be applied to developing novel subunit or nucleic acidbased vaccines and/or monitoring response to such vaccines. It also enables the ability to supply the J o u r n a l P r e -p r o o f immune system with synthetic peptide(s) that mirror natural APCs presentation observed from a broad spectrum of the HLA class II alleles from different geographic regions to maximize T-cell responses. The authors grateful acknowledge Richard E. Higgs, Andrea Ferrante, and Laurent Malherbe for critical review and helpful suggestions during preparation of the manuscript. R.W.S dedicates this work to the memory of Karen J. Haugh. This work was supported by Eli Lilly and Company. The authors declare no competing interests. The donor ID is listed on X-axis and the total number of spike protein peptides from that particular donor denoted on the Y-axis. See Table 1 and Supplemental Table S1 for donor information and Supplemental Table S2 for all identified SARS-Cov-2 spike protein ECD peptides. The various subunits of the spike protein are denoted at top of the schematic. Aligned HLA-II presented peptides are displayed as a heatmap with each of the nine donors on an individual row. The shades of red correspond to the number of overlapping peptides encompassing a given amino acid position. The lightest shade represents a single peptide and the darkest red signifies when at least five peptides overlap an amino acid position. The unlabeled yellow region at the N-term corresponds to the signal peptide and the light green region at the C-term corresponds to the affinity tag used for purification. The S1/S2 cleavage site contains an R685A mutation (denoted with an asterisk). The RBD portion of the heatmap was expanded and displayed in the bottom panel along with numerical markers to indicate location within the SARS-CoV-2 Spike Protein ECD. Data from each donor in the panel was collected from a single biological and technical replicate. See Supplemental Table S2 for all identified SARS-Cov-2 spike protein ECD peptides. S1, Subunit 1; S2, Subunit 2; NTD, N Terminal Domain; RBD, Receptor Binding Domain; CTD, C Terminal Domain. (A) Prevalence of peptide clusters across donors. The number of donors from which each of the 73 peptide clusters were observed are shown. The X-axis denotes the number of donors in which a cluster was observed, and the Y-axis denotes the number of clusters observed from that particular number of donors. For example, 31 clusters were observed from any given single donor, another 10 clusters were observed from 2 of the 9 donors in the panel, and so forth. Peptide clusters that were identified in at least 5 donors were deemed "consensus" and colored red, whereas clusters seen in 4 or less donors are characterized as "restricted" and are colored blue. (B) Number and type of cluster by donor. The total number of clusters observed for each donor are shown. The Y-axis denotes the total number of clusters from the particular donor listed on the X-axis. Clusters designated as restricted or consensus are denoted as blue and red, respectively. The overall median of 18 clusters is indicated with a dashed line. (C) Cluster depth. The distribution in the different number of peptides contained within a cluster are shown. The number of peptides within any given cluster are denoted on the X-axis and the number of clusters containing the peptides within the designated bins are denoted on the Yaxis. Clusters designated as restricted or consensus are denoted as blue and red, respectively. (D) The distribution of the clusters across the spike protein domains. The particular domain is designated on the X-axis and the number of clusters contained with the particular domain denoted on the Y-axis. Clusters designated as restricted or consensus are denoted as blue and red, respectively. See Supplemental Table S3 for further cluster details. (A) Prediction comparison. The observed SARS-CoV-2 spike protein clusters were compared to the predicted HLA-II peptides. A Venn diagram shows the overlap between the observed minimum cluster sequence and the predicted peptides. To view the details of predicted versus observed clusters, see Supplemental Table S4 . (B) Cluster conservation. The various segments of the SARS-CoV-2 spike protein are denoted on the top row and sequence mismatches to the SARS-CoV spike protein are delineated with a red bar in the second row. The third row is a heatmap of the clusters seen from the SARS-CoV-2 spike protein that contain peptides with an exact sequence match to SARS-CoV spike protein (darker orange represents overlapping clusters). The blue cluster designates a consensus cluster. S1, Subunit 1; S2, Subunit 2; NTD, N Terminal Domain; RBD, Receptor Binding Domain; CTD, C Terminal Domain Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Robert Siegel (siegel_robert@lilly.com). This study did not generate new unique reagents. Frozen PBMCs were obtained from 9 informed consent healthy donors (Discovery Life Sciences) according to local ethical practice. Protocol was adapted as described (Sallusto and Lanzavecchia, 1994) with the following modifications. PBMCs were selected from the available inventory to have the broadest HLA-DRB1 diversity as possible. The cells were lysed on Day 6 with RIPA lysis and extraction buffer (Thermo Fisher Scientific, cat # 89900, 25 mM Tris•HCl pH 7.6, 150 mM NaCl, 1% NP-40, 1% sodium deoxycholate, 0.1% SDS) containing 1:1000 of 10 unit/uL DNase (Roche, cat# 04716728001) and 1 tablet of EDTA free protease inhibitors (Roche, Cat# 11836170001) per 10 mL of lysis buffer. The lysates were frozen at -80°C. An Agilent AssayMAP robot was used to isolate the HLA-II molecules in the lysate. One hundred microliters of 1 mg/mL biotinylated anti-pan HLA class II antibody (in house produced Tu39 clone) was immobilized on streptavidin cartridges (Agilent, SA-W, 5 uL) by passing over the cartridge at 5 uL/min. The cartridge was washed with 50 uL of PBS three times. Lysates were thawed, passed over a 0.45 um filter, and 1 mL of each sample was loaded onto a 96 well polypropylene plate. The lysate was aspirated into the syringes and the antibody loaded cartridge is attached to the syringe tip. The lysate is passed over the affinity cartridge at 5 uL/min at room temperature for 200 minutes. The cartridge is washed 2 x 50 uL with 100mM ammonium acetate at 25 uL/min and once with 50 uL water at 25 uL/min. The cartridge is eluted with 50 uL of 5% acetic acid with 0.1% TFA at 2 uL/min into a 96 well polypropylene PCR plate. The eluted peptides were passed over a 10k MWCO spin filter treated with 1 mg/mL BSA (Sigma, 05470) and 100ug/mL angiotensin I peptide and washed with 5% acetic acid. The filtered material was loaded in a 96 well polypropylene PCR plate for mass spec analysis. The samples were analyzed with a Thermo LUMOS mass spectrometer using a Thermo easy 1200 nLC-HPLC system. The separation was carried out with a 75µm x 7 cm YMC-ODS C18 column (New Objectives) coupled to a custom nanospray interface with an electrospray potential of 1.2 kV. The solvents were A -0.1% formic acid in water (Thermo Fisher Scientific, Optima™ LC/MS Grade) and B -80% acetonitrile with 0.1% formic acid (Thermo Fisher Scientific, Optima™ LC/MS Grade). The gradient was 65 minutes using a flow rate of 250 nL/min, starting with a 60 min 2-55%B ramp followed by a 1 min 55-100%B ramp and a 4 min hold at 100%B). The Lumos was run with a full scan at 240,000 resolution in the orbitrap followed by a 3 second data dependent MS/MS cycle comprised of ion trap rapid scans where +2 ions were fragmented by HCD(CE of 15,22,28) and +3 and +4 ions were fragmented by HCD (CE of 15, 22, 28) and EThcD (Calibrated Charge-Dependent ETD parameters and supplemental HCD (CE of 50)). The data were analyzed with the Lilly proteomics pipeline (Higgs et al. (2008) . The data conditioning steps consisted of extraction from the vendor format, fitting parent ions for data dependent scans to theoretical isotope patterns and correcting the monoisotopic mass and charge of the parent ion, determining the fit of the parent ion isotope to the theoretical isotope pattern and filtering out ms/ms scans if the parent ions did not match the isotope pattern with a score of 0.6 or greater. From the filtered scans, an MGF file was created along with a table of spectral features for each spectrum. The spectral identifications were performed with X! Tandem version 2017 and OMSSA version 2.1.7 search engines. A database was used consisting of the SARS-CoV-2 spike extracellular domain HIS-FLAG tagged protein and 2134 common human and bovine proteins identified from HLA-II bound peptides seen from Raji cells, DCs, and bovine proteins in the cell media. The search engine parameters included a no enzyme search with a maximum missed cleavage site setting of 30, 10 ppm tolerance for parent ions, and 0.5 m/z tolerance for the fragment ions. Potential amino acid modifications included: Cysteine mods of free SH; disulfide; mercaptoethanolation; mono,di, and tri oxidation; and cysteinylation; deamidation of glutamine and asparagine; methionine oxidation; tryptophan oxidation, deoxidation, oxidation to kynurenin. HCD spectra were searched for b-and y-ions and EThcD were searched for c-, z-, b-, and y-ions. False positive identifications were controlled by running the searches against a reversed version of the protein database and estimating false discovery rates. An iterative random forest classifier was trained using search results and spectral features to increase identification sensitivity in a manner similar to the percolator algorithm (Kall et al., 2007) . The search results from X! Tandem and OMSSA were pooled and peptides with q-values < 0.20 were assigned to the smallest group of proteins that account for all identified peptides. Pepnovo plus (Frank and Pevzner, 2005) was run on all the spectra with a parent ion tolerance of 10 ppm and a fragment tolerance of 0.5 Th. Modifications included were methionine oxidation, cysteinylation, and disulfide formation. The output was filtered for peptide tags that matched the SARS-CoV-2 spike extra cellular domain HIS FLAG protein. Tag hits were checked against the results from the database search. All tag hits were matched to the database search results indicating that there were no unknown modifications present in the results. The pipeline output was analyzed using KNIME 3.3 (Mazanetz et al., 2012) to merge all the donor search results, manual review of the ms/ms spectra of the identifications to confirm presence of at least 4 contiguous fragment ions to matched peptide sequence, align the peptides to the SARS-CoV-2 spike extracellular domain HIS FLAG protein, and create an excel file with the alignment. The identified peptides from the SARS-CoV-2 spike glycoprotein were clustered with the IEDB Epitope Cluster Analysis tool v1.0 using the default settings. The clustering was manually adjusted to group a continuous amino acid run of at least 9 residues after sorting by donor. This adjustment was reflected in the cluster number as a decimal point after the assigned cluster number from the IEDB algorithm. Predicted SARS-Cov-2 spike protein peptides (Grifoni et al., 2020a) were matched with the MAPPs cluster if they shared at least a 9 amino acid overlap to the minimum cluster sequence. The identified HLA-II peptides were checked for exact matches using the UNIX grep command. The sequences searched were the human database (UniProt version 2020_02), spike proteins from SARS-CoV (GenBank accession # P59594.1), human coronavirus NL63, 229E, HKU1, OC43 (GenBank accession # QED88040.1, AST12964.1, AGW27881.1, AAR01015.1), and MERS (GenBank accession # YP_009047204.1). An imperfect match search was run using the UNIX command agrep to look for 1 or 2 mismatched amino acids. The searches were done for each peptide with insertion and deletions scores set to 100 to discount any insertion or deletions during the search. The imperfect match search was run against the human database (UniProt version 2020_02). Table S2 . HLA-II Peptide Alignment to SARS-CoV-2 Extracellular Domain, Related to Figure 1 and Figure 2 . Naturally processed HLA-DR3-restricted HHV-6B peptides are recognized broadly with polyfunctional and cytotoxic CD4 T-cell responses Identification of antigens presented by MHC for vaccines against tuberculosis SARS-CoV-2-reactive T cells in healthy donors and patients with COVID-19 Contributions of the structural proteins of severe acute respiratory syndrome coronavirus to protective immunity A single T cell epitope drives the neutralizing anti-drug antibody response to natalizumab in multiple sclerosis patients The coronavirus surface glycoprotein Inflammatory stimuli induce accumulation of MHC class II complexes on dendritic cells The spike glycoprotein of the new coronavirus 2019-nCoV contains a furin-like cleavage site absent in CoV of the same clade PepNovo: de novo peptide sequencing via probabilistic network modeling A Sequence Homology and Bioinformatic Approach Can Predict Candidate Targets for Immune Responses to SARS-CoV-2 Targets of T Cell Responses to SARS-CoV-2 Coronavirus in Humans with COVID-19 Disease and Unexposed Individuals Design and engineering of deimmunized biotherapeutics Characterization of CD4 T Cell Epitopes of Infliximab and Rituximab Identified from Healthy Donors Label-free LC-MS method for the identification of biomarkers Characterization of peptides bound to the class I MHC molecule HLA-A2.1 by mass spectrometry Peptides presented to the immune system by the murine class II major histocompatibility complex molecule I-Ad Peptides identified on monocyte-derived dendritic cells: a marker for clinical immunogenicity to FVIII products Functional recombinant MHC class II molecules and high-throughput peptide-binding assays Semi-supervised learning for peptide identification from shotgun proteomics datasets Ligandomes obtained from different HLA-class II-molecules are homologous for N-and Cterminal residues outside the peptide-binding cleft N-linked glycosylation does not impair proteasomal degradation but affects class I major histocompatibility complex presentation Identification of N-linked glycosylation sites in human nephrin using mass spectrometry Antigenic specificities of human CD4+ T-cell clones recovered from recurrent genital herpes simplex virus type 2 lesions Preferential presentation of herpes simplex virus T-cell antigen by HLA DQA1*0501/DQB1*0201 in comparison to HLA DQA1*0201/DQB1*0201 Structure of SARS coronavirus spike receptor-binding domain complexed with receptor Angiotensin-converting enzyme 2 is a functional receptor for the SARS coronavirus Analysis of MHC class II antigen processing by quantitation of peptides that constitute nested sets Interaction between heptad repeat 1 and 2 regions in spike protein of SARS-associated coronavirus: implications for virus fusogenic mechanism and identification of fusion inhibitors The HLA-DP peptide repertoire from human respiratory syncytial virus is focused on major structural proteins with the exception of the viral polymerase Proteomics Analysis Reveals That Structural Proteins of the Virion Core and Involved in Gene Expression Are the Main Source for HLA Class II Ligands in Vaccinia Virus-Infected Cells Identification and Characterization of Complex Glycosylated Peptides Presented by the MHC Class II Processing Pathway in Melanoma Drug discovery applications for KNIME: an open source data mining platform Importance of HLA-DQ and -DP restriction elements in Tcell responses to soluble antigens: mutational analysis Impact of human sequences in variable domains of therapeutic antibodies on the location of CD4 T-cell epitopes Host cell proteases: Critical determinants of coronavirus tropism and pathogenesis Naturally processed measles virus peptide eluted from class II HLA-DRB1*03 recognized by T lymphocytes from human blood Development and validation of a broad scheme for prediction of HLA class II restricted T cell epitopes TepiTool: A Pipeline for Computational Prediction of T Cell Epitope Candidates MAPPs for the identification of immunogenic hotspots of biotherapeutics; an overview of the technology and its application to the biopharmaceutical arena The SARS-CoV-2 receptor-binding domain elicits a potent neutralizing response without antibody-dependent enhancement. bioRxiv NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data A novel strategy for the discovery of MHC class II-restricted tumor antigens: identification of a melanotransferrin helper T-cell epitope Efficient presentation of soluble antigen by cultured human dendritic cells is maintained by granulocyte/macrophage colony-stimulating factor plus interleukin 4 and downregulated by tumor necrosis factor alpha Broad Repertoire of the CD4+ Th Cell Response in Spontaneously Controlled Hepatitis C Virus Infection Includes Dominant and Highly Promiscuous Epitopes MHC-associated peptide proteomics enabling highly sensitive detection of immunogenic sequences for the development of therapeutic antibodies with low immunogenicity Cell entry mechanisms of SARS-CoV-2 Measurement of MHC/peptide interactions by gel filtration or monoclonal antibody capture Crystal structure of an MHC class I presented glycopeptide that generates carbohydrate-specific CTL Vaccinia peptides eluted from HLA-DR1 isolated from virus-infected cells are recognized by CD4+ T cells from a vaccinated donor Immunization with SARS coronavirus vaccines leads to pulmonary immunopathology on challenge with the SARS virus The Immune Epitope Database (IEDB): 2018 update Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein Post-hoc assessment of the immunogenicity of three antibodies reveals distinct immune stimulatory mechanisms Molecular Mechanism for Antibody-Dependent Enhancement of Coronavirus Entry Isolation and characterization of new human carrier peptides from two important vaccine immunogens Searching immunodominant epitopes prior to epidemic: HLA class II-restricted SARS-CoV spike protein epitopes in unexposed individuals Potential Therapeutic Targets and Promising Drugs for Combating SARS-CoV-2 Dendritic cells display peptides spanning the entire SARS-CoV-2 spike protein II peptides from 11 regions are presented by a majority of the donors analyzed. • One region with promiscuous HLA-II presentation is conserved with SARS-CoV • The correlation of presented to predicted peptides is low pulse dendritic cells derived from healthy human donors with SARS-CoV-2 spike glycoprotein to determine the precise sequences presented for T-cell surveillance. Regions with promiscuous presentation are identified with poor correlation to predicted epitopes. One region with promiscuous presentation is conserved with SARS-CoV spike glycoprotein sequence