key: cord-0936367-91rqso1j
authors: Sohail, Muhammad Saqib; Ahmed, Syed Faraz; Quadeer, Ahmed Abdul; McKay, Matthew R.
title: In silico T cell epitope identification for SARS-CoV-2: Progress and perspectives
date: 2021-01-17
journal: Adv Drug Deliv Rev
DOI: 10.1016/j.addr.2021.01.007
sha: dba2d490f4e2a37ef903332679b34f6c8e7fdf0f
doc_id: 936367
cord_uid: 91rqso1j

Growing evidence suggests that T cells may play a critical role in combating severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Hence, COVID-19 vaccines that can elicit a robust T cell response may be particularly important. The design, development and experimental evaluation of such vaccines is aided by an understanding of the landscape of T cell epitopes of SARS-CoV-2, which is largely unknown. Due to the challenges of identifying epitopes experimentally, many studies have proposed the use of in silico methods. Here, we present a review of the in silico methods that have been used for the prediction of SARS-CoV-2 T cell epitopes. These methods employ a diverse set of technical approaches, often rooted in machine learning. A performance comparison is provided based on the ability to identify a specific set of immunogenic epitopes that have been determined experimentally to be targeted by T cells in convalescent COVID-19 patients, shedding light on the relative performance merits of the different approaches adopted by the in silico studies. The review also puts forward perspectives for future research directions.

targeted in convalescent COVID-19 patients. Insights into the current state of SARS-CoV-2 T cell epitope prediction are also put forward, together with perspectives on future research directions and opportunities.

Schematic illustration of T cell responses against SARS-CoV-2 and T cell epitope prediction using in silico approaches. (A) Viral peptides, derived from SARS-CoV-2 proteins after multiple intra-cellular processing steps, are presented on the surface of infected cells and antigen presenting cells via HLA class I and class II molecules, respectively. Naïve T cells, specialized in distinguishing foreign-peptides from self-peptides via training in the thymus, scan these peptide-HLA complexes to determine if the peptides belong to a foreign microbe. Recognition of a foreignpeptide leads to activation, proliferation, and differentiation of naïve T cells into effector cells. There are two main types of effector T cells: CD8 + T cells (or cytotoxic T lymphocytes; CTLs) that get activated by viral peptides bound to HLA class I molecules and help in killing the SARS-CoV-2 infected cells (right panel), while CD4 + T cells (or helper T lymphocytes) get activated by peptides bound to HLA class II molecules and help in further enhancing SARS-CoV-2specific CD8 + T cell and antibody responses (left panel). These adaptive immune cells, activated by peptide-HLA complexes, can collectively mount a potent immune response against SARS-CoV-2. (B) In silico approaches analyze SARS-CoV-2 protein sequences to predict a number of potential HLA-I and HLA-II epitopes that can be used to guide experiments to characterize T cell responses in COVID-19 patients and to inform SARS-CoV-2 vaccine design.

peptide-HLA binding prediction methods (the remaining 61 studies). We review and discuss each of these approaches in the following.

Compared to other human coronaviruses, early studies using phylogenetic analysis [24, 25] suggested SARS-CoV-2 to be most similar to SARS-CoV, the virus that caused the 2003 SARS outbreak. In fact, the genetic similarity of SARS-CoV-2 and SARS-CoV was found to be quite high in the structural proteins (~76% in S and >90% in N, M, and E proteins) [20] , which were known to induce robust and long-lasting T cell immunity against SARS-CoV [26] . Motivated by the high genetic similarity of SARS-CoV with SARS-CoV-2, multiple in silico studies [20] [21] [22] [23] used the information of T cell epitopes available from previous SARS-CoV immunological studies to predict likely targets of SARS-CoV-2 T cell responses (Table 1) . This approach is well motivated, with cross-reactive T cell epitopes being reported previously for genetically similar viruses [27] , including zika virus, dengue virus, and other flaviviruses [28] [29] [30] [31] . Interestingly, this was also the basic idea behind the first successful vaccine against an infectious disease, developed by Jenner in 1796, which induced protective immunity against the smallpox virus through inoculation with a related cowpox virus.

For identifying the SARS-CoV T cell epitopes likely to generate cross-reactive immune responses against SARS-CoV-2, Ahmed et al. [20] scanned all available SARS-CoV T cell epitopes in the ViPR database [32] that had been determined previously by using experimental positive HLA binding or T cell assays, and identified epitopes that had an exact match in the SARS-CoV-2 sequences available at that time. Subsequent studies by Lee et al. [21] , Grifoni et al. [22] , and Ranga et al. [23] used the SARS-CoV epitopes as well as peptide-HLA binding prediction methods (discussed in the next subsection) and proposed the common ones as potential SARS-CoV-2 epitopes.

The sequence data of SARS-CoV-2 continues to be deposited into public databases at an unprecedented rate, with the number of complete genome sequences available for SARS-CoV-2 in the GISAID database exceeding 65,000 (as of September 2020), much greater than that available for many other viruses (e.g., ~12,900 whole genome sequences are publicly available for HIV [33] , and ~4400 for the hepatitis C virus [34] ). Taking advantage of this data, the authors of [20] later proposed a web-based platform, COVIDep [35] , for reporting SARS-CoV T cell epitopes as potential vaccine targets for SARS-CoV-2 based on the latest sequence data available. Compared to [20] , which reported SARS-CoV epitopes that were fully conserved within the SARS-CoV-2 sequences available in February 2020, COVIDep provides a parameter that enables identification of SARS-CoV epitopes that are identical among a desired fraction of the latest available SARS-CoV-2 sequence data.

The epitopes predicted using SARS-CoV immunological information have been used by multiple experimental studies for SARS-CoV-2 to probe the immune responses in convalescent COVID-19 patients. As will be discussed in Section 4, many epitopes predicted using this approach have been found to elicit a cross-reactive T cell response in patients.

The large majority of in silico SARS-CoV-2 vaccine design studies so far have predicted T cell epitopes using existing peptide-HLA binding prediction methods (see Table 1 ). A number of these studies also performed a subsequent refinement step where the set of epitopes was narrowed down by running computational tests to identify those capable of eliciting a robust and safe T cell response. In some cases, the refined set of predicted epitopes were used to design a vaccine construct by including appropriate signal sequences, proteasomal cleavage sites, and linkers, and these were further tested in silico for features such as immunogenicity, safety, and structural stability.

The in silico search for SARS-CoV-2 T cell epitopes has benefitted from years of research in developing peptide-HLA binding algorithms. These methods have matured over time thanks to increased availability of experimental data and methodological advancements. Most methods are specialized for predicting either HLA class I-restricted epitopes (i.e., CD8 + T cell epitopes) or HLA class II-restricted epitopes (i.e., CD4 + T cell epitopes), while some methods have been developed for predicting epitopes for both HLA classes. Based on the methodology employed, these T cell epitope prediction methods can be broadly divided into two groups: Machine learning (ML) based methods and (non-ML) bioinformatics based methods.

The current state-of-the-art ML epitope prediction methods utilize artificial neural networks (ANN). Such [18] , NeonMHC2 [19] and MARIA [45] (for a historical perspective on the evolution of these methods, see [46] ). The suffix -pan in the names of these methods indicates panspecificity; i.e., the ability to predict peptide-HLA binding for a large set of alleles within the HLA class, including the ones that are absent in the training set. This feature has been made possible by integrating information about the amino acids characterizing the HLA binding groove in training the ANN [47].

The earlier ANN-based methods such as NetMHC, NetMHCpan, NetHMCII-2.3, NetMHCIIpan-3.0, NetMHCIIpan-3.1, etc., were mostly trained using data obtained from HLA binding assays, which characterize the binding affinity of synthetic peptides to HLA molecules. The most recent ANN-based methods, such as MHCflurry, NetMHCpan-4.0, and NetMHCIIpan-4.0, additionally employ data from HLA ligand elution assays, which use advances in mass spectrometry techniques to isolate a large number of peptides that are naturally processed and presented by human cells expressing a single HLA. A few SARS-CoV-2 epitope prediction studies such as HLAthena and NeonMHC2 have also used methods that were trained solely on HLA ligand elution assay data. However, it has been demonstrated for both HLA classes that the ANN models trained using both types of data provide superior performance to models trained on one type of data only [16] .

Some methods used by SARS-CoV-2 epitope prediction studies include the HLA-I specific methods NetChop [48] and NetCTL-1.2 [49] , which incorporated additional intracellular factors involved in HLA antigen presentation in an attempt to improve peptide-HLA binding prediction. These factors include proteasomal cleavage sites and transport efficiency of TAP (the transporter associated with antigen processing) in antigen presenting cells [50] . However, inclusion of these factors has been found to show only a marginal improvement in peptide-HLA binding prediction over the ANN-based methods trained solely on HLA binding assays [48] . While almost all recent epitope prediction methods are ANN-based, a few early ones (such as CTLPred [51] and RANKPEP [52] ) were based on an alternative ML approach, support vector machines, and have also been used by SARS-CoV-2 studies (Table 1) .

Distinct from the ML methods just described, several SARS-CoV-2 studies have used (non-ML) bioinformatics methods to predict SARS-CoV-2 epitopes. These methods use position-specific scoring functions that assume each position in a peptide to be independently interacting with HLA. They generally assign a score to the studied peptide according to position-specific amino acid features such as amino acid frequencies or amino acid physicochemical profiles (e.g., obtained from BLOSUM matrices [53] ) at specific peptide positions. Such methods that have been used to predict SARS-CoV-2 epitopes include ProPred1 [54] for predicting HLA-I epitopes, ProPred [55] and Predivac [56] for predicting HLA-II epitopes, and Vaxign [57] for predicting epitopes for both HLA classes. Some bioinformatics methods also use scoring functions involving the interactions of pairs of peptide positions with the HLA. Methods in this category that have been used to predict SARS-CoV-2 epitopes include SMM [58, 59] for predicting HLA-I epitopes, SMM-Align [60] and the method by Sturniolo [61] for predicting HLA-II epitopes, and MHCPred [62] for predicting epitopes restricted by both HLA classes. In general, these bioinformatics methods work for a limited set of HLA alleles, with the exception of Predivac [56] and the method by Sturniolo [61] , both of which are pan-specific.

Several SARS-CoV-2 studies have used the analysis resource provided by the immune epitope database (IEDB) [43] for epitope prediction. The IEDB provides a collection of several of the above-mentioned prediction methods, and also recommends a best performing method for each HLA class. Some SARS-CoV-2 studies have used the IEDB's HLA class-specific consensus methods, which predict peptide-HLA binding for their respective HLA class based on the consensus of several prediction methods [63, 64] . A few studies have also used pipeline tools like pVACtools [65] and TepiTool [66] , that allow users to predict epitopes for both HLA classes from a set of pre-defined ML and bioinformatics methods. Another method used for predicting SARS-CoV-2 epitopes is nHLAPred [67] , which uses a combination of ML and bioinformatics methods to predict HLA-I-restricted epitopes.

The variety of peptide-HLA binding prediction methods used by in silico studies to predict SARS-CoV-2 epitopes raises questions as to whether the predictions of these studies are overlapping or distinct, and whether there are specific methods that appear most appropriate for predicting SARS-CoV-2 epitopes. We explore these questions subsequently in Sections 3 and 4, where we show that multiple common SARS-CoV-2 epitopes have indeed been predicted by independent in silico studies, while also identifying methods whose predicted epitopes are found to have induced T cell responses in convalescent COVID-19 patients.

Presentation of a peptide by an HLA molecule, while necessary for inducing a T cell response, does not guarantee T cell recognition and activation. That is, presentation does not imply that the peptide will be immunogenic. Thus, it is important to assess the immunogenicity of the predicted epitopes obtained from peptide-HLA binding prediction methods. The specific factors that differentiate an immunogenic HLA-presented peptide from a non-immunogenic one are still not well known, though a number of factors have been suggested to be the cause of this difference. For example, immunogenicity of a peptide may increase due to abundance of peptide-HLA complexes displayed on cells [68, 69] , early expression of the protein to which the peptide belongs [63, 70, 71] , competition with other peptide-HLA complexes for stimulating T cells [72, 73] , and low genetic similarity of the peptide to a self-peptide (i.e., a host derived peptide) [74] [75] [76] [77] [78] .

Several existing computational tools have been used to assess the immunogenicity of the SARS-CoV-2 epitopes obtained from peptide-HLA binding prediction methods. Of these, Vaxijen-2.0 [79] is the most commonly used and can predict immunogenicity of both HLA-I and HLA-II epitopes. This method was originally developed to predict protein immunogenicity by accounting for higher order interactions between protein sequence positions and exploiting the physicochemical properties (hydrophobicity, molecular size, polarity) of amino acids. It was trained on a set of known immunogenic and non-immunogenic proteins of viral, bacterial and tumor origins. Another common method that has been used for determining immunogenicity of HLA-I-restricted peptides in SARS-CoV-2 studies is the one available on the IEDB (Calis et al. [80] ). This method was developed by comparing a set of immunogenic and non-immunogenic presented peptides compiled from multiple experimental sources. Specifically, immunogenicity of presented peptides was found to be largely dependent on the amino acids present at positions 4-6 of the peptide, and specific physicochemical properties of the amino acids, having large and aromatic side chains, in the peptide. A model exploiting these features was then developed to predict immunogenicity of a given HLA-I-restricted peptide.

In addition to Vaxijen-2.0 and Calis et al., one SARS-CoV-2 study [21] used iPred [81] for immunogenicity prediction, which is also based on physicochemical properties of the amino acids in the peptide. A novel method to predict the immunogenicity of SARS-CoV-2 HLA-I restricted peptides was also proposed in Gao et al. [82] which utilizes a physicsbased model and takes into account factors such as peptide-HLA binding affinity and similarity of a peptide with pathogen-derived and human-derived peptides. For model training and testing, a well-characterized dataset of immunogenic HIV T cell epitopes was used. This model was reported to outperform Calis et al. [80] in predicting immunogenic epitopes for the HIV dataset. Gao et al. used this method to filter the SARS-CoV-2 epitopes proposed in Ahmed et al. [20] and Prachar et al. [83] based on their predicted immunogenicity.

For the specific case of peptides presented by HLA class II, the activated CD4 + T cells differentiate into T helper (Th) cells of various types having distinct effector functions [72] . Of particular importance are the Th1 and Th2 cells. Th1 cells secrete the interferon-gamma (IFN-) cytokine and promote cell-mediated immunity, while Th2 cells secrete the interleukin-4 cytokine and promote humoral immunity [84] . For inducing robust cell-mediated immunity, multiple in silico SARS-CoV-2 vaccine design studies have further screened the predicted immunogenic HLA-II peptides to identify those that are likely to induce IFN- (Table 1) . This was done primarily using the computational method IFNepitope [85] , an ML approach that uses a dataset comprising peptides experimentally determined to induce IFN-, along with peptides that induce cytokines other than IFN-. This method was trained on data of IFN- inducing epitopes obtained from IEDB.

In addition to screening for immunogenicity, to ensure that stimulated T cell responses are robust to genetic variations arising during viral evolution, it is important to identify epitopes that are highly conserved. For SARS-CoV-2, while the mutation rate appears low due to the presence of a genetic proof-reading exoribonuclease nsp14 protein [86] , it is still important to consider conservation of epitopes to avoid mutations that accumulate in the population [87] . Thus, a few in silico SARS-CoV-2 vaccine design studies have considered the conservation of predicted immunogenic epitopes among the available SARS-CoV-2 sequences (or a subset of them) for providing vaccine target recommendations ( Table 1) . The majority of the in silico SARS-CoV-2 studies have computed the conservation of predicted epitopes using in-house code, while a few have used the epitope conservancy tool available at the IEDB [43].

Vaccines are generally administered to otherwise healthy individuals as a preventive measure against disease. A crucial factor when selecting epitopes for a vaccine, apart from the ability of the epitopes to elicit a protective immune response, is whether or not they have any associated safety concern. Any adverse reaction caused by a vaccine is likely to contribute to anti-vaccine sentiment and potentially lead to loss of public trust in immunization programs [88, 89] . Even though any potential vaccine would ultimately need to undergo rigorous in vivo and in vitro safety trials, computational tools can help, as a first step, to screen for potential safety concerns. Here, we give a brief review of J o u r n a l P r e -p r o o f Journal Pre-proof computational tools that have been used for assessing allergenicity, toxicity, and autoimmunity of SARS-CoV-2 epitopes. Almost all studies proposing an in silico vaccine perform these tests on the vaccine construct, while some also perform these at the initial epitope selection step (see Table 1 ).

Allergenicity of a substance is its potential to cause hypersensitivity in an individual by evoking an immune response [90] . The reason why certain proteins or peptides cause an allergic reaction in humans is not precisely known. Several SARS-CoV-2 studies have used more than one computational tool to test for allergenicity of predicted epitopes (see Table 1 ). These include AlgPred [91] , AllerTOP [92] , AllerTOP-2.0 [93] , AllergenFP [94] , and AllerCatPro [95] . AlgPred is a suite of multiple allergenicity prediction approaches based on motif alignment, support vector machines, and hybrid approaches; AllerTOP, AllerTOP-2.0, and AllergenFP are three related methods based on the physicochemical similarity of the considered protein with known allergens; while AllerCatPro is a recently developed method that uses 3D structure information along with sequence similarity with known allergens to predict allergenicity. In a comparative test, AllerCatPro was shown to have superior performance to the other methods in identifying allergens [95] . However, of the 65 in silico studies considered in this review (Table 1) , only one study [96] used AllerCatPro for allergenic screening of predicted SARS-CoV-2 epitopes.

Toxicity is the capacity of a substance to damage a living organism by interacting with biomolecules and disrupting normal cellular functions [97] . The effects of this disruption may range from slight symptoms like nausea in mild cases, through to death in severe cases. SARS-CoV-2 studies that tested for toxicity of predicted epitopes have used ToxinPred [98] , a support vector machine based method that uses a position-specific scoring function to predict peptide toxicity from sequence information. ToxinPred has been trained on a set of known toxic (of bacterial and animal origin) and non-toxic peptides obtained from the Universal Protein Resource [99] . While a number of SARS-CoV-2 in silico studies have used ToxinPred (Table 1) , the accuracy of these toxicity predictions for viral epitopes has yet to be confirmed.

Autoimmunity refers to the process of the body launching an immune response against its own healthy self [100] . In the context of T cells, this would constitute a T cell mounting a response against a self-peptide. The exact causal mechanism that triggers such a response in not known, though there is evidence that it may be triggered in some individuals in the aftermath of a viral or bacterial infection [101, 102] . While evidence for a vaccine-induced autoimmune response has not been found in controlled studies, occasional case reports of such an occurrence are present in the literature [103, 104] . This gives rise to some concern that if a vaccine contains viral epitopes similar to self-peptides, it may evoke an autoimmune response [105, 106] . To account for this, some of the SARS-CoV-2 studies have checked for sequence similarity between the human proteome and viral epitopes to identify epitopes that may potentially induce an autoimmune reaction [107] . The computational hurdle here is the large size of the human proteome which makes testing multiple epitopes for sequence similarity challenging. However, of all possible SARS-CoV-2 peptides of lengths 8-10 (HLA-I restricted) and 13-20 (HLA-II restricted), only ~0.03% were found to match exactly with the human proteome [108] , suggesting that autoimmunity may not be a common issue for SARS-CoV-2.

Although the above-mentioned screening tests are well motivated, the ability of the specific tools used to effectively screen SARS-CoV-2 epitopes for immunogenicity, induction of IFN-, allergenicity, and toxicity still remains unclear. In an attempt to shed some light on the predictive nature of these tools, in Section 4 we analyze and compare their predictions when applied to epitopes of SARS-CoV-2 that have been confirmed experimentally in patients.

While some of the SARS-CoV-2 in silico studies only identified a list of epitopes, several went a step further to propose a multi-epitope vaccine construct (see Table 1 ). Briefly, this entailed selecting appropriate linkers, which play a key role in the structural stability of the vaccine, and adjuvants, which help to boost the immune response. These in silico designs spanned a host of modern vaccine technologies, e.g., subunit vaccines, peptide vaccines, RNA and DNA vaccines [109, 110] . Almost all SARS-CoV-2 studies that presented an in silico design of a multi-epitope vaccine construct also performed in silico tests for immunogenicity, conservation, allergenicity, and toxicity using the tools reviewed above, regardless of whether or not these tests were performed at the epitope level. As mentioned above, the majority of these in silico tools were originally developed to analyze proteins, and thus they may be more appropriate for analyzing multi-epitope vaccine constructs than individual epitopes. In addition, physicochemical composition, secondary and tertiary structure predictions, as well as molecule docking studies with human immune receptors were also investigated for the proposed constructs. None of these studies, however, tested their predictions experimentally. As the focus of the current review is on epitope identification, we refer the interested reader to [111] for more details on vaccine constructs. J o u r n a l P r e -p r o o f Journal Pre-proof Table 1 . List of reviewed in silico SARS-CoV-2 T cell epitope prediction studies.

HLA-I epitope prediction 2

Immunogenicity 4 IFN- production 5 Conservation Allergenicity 6 Toxicity 7 Autoimmunity

Ahmed2020 [20] Using SARS-CoV immunological data

Using SARS-CoV immunological data, NetMHCpan-4.0

Using SARS-CoV immunological data, Tepitool

Ranga2020 [23] Using SARS-CoV immunological data,

Using SARS-CoV immunological data, NetMHCpan-4.0

Poran2020 [115] HLAthena NeonMHC2

Response against few predicted epitopes tested in recovered patients 

Lucchese2020 [169] Identified pentamers from the proteome -

To compare the predictions of the in silico studies (listed in Table 1 ), the predicted epitopes and their specific HLA allele associations [170] were compiled. In cases where the specific HLA allele information was unavailable, we recorded it as -NA‖. For a meaningful comparison, we focused only on the predicted T cell epitopes comprising 8 to 20 amino acids, representing the typical combined range of CD4 + and CD8 + T cell epitopes [171, 172] . This excluded Enayatkhani et al. [118] , Yarmarkovich et al. [167] , Yazdani et al. [168] , and Lucchese et al. [169] from the analysis, giving a remaining set of 61 studies (Fig. 2) .

The number of T cell epitopes predicted by these SARS-CoV-2 studies varied widely (minimum = 1 epitope; maximum = 3407 epitopes), even for studies that used the same peptide-HLA prediction method (Fig. 2) . This can be attributed to differences in subsequent prediction refinement steps and the study objectives. Studies reporting a very limited number of T cell epitopes (<10), e.g., Joshi et al. [156] , Rahman et al. [165] , Khan et al. [148] , Jakhar et al. [135] , Samad et al. [130] , were primarily focused on designing a short vaccine construct for eliciting a targeted immune response. Most of these studies refined the set of initial epitopes obtained by peptide-HLA binding prediction methods based on the tests J o u r n a l P r e -p r o o f described in Section 2.2.2. In contrast, the studies that predicted a large set of epitopes (>400) had objectives such as identifying all possible epitopes having high coverage in a specific ethnic population (Feng et al. [152] ), identifying epitopes recognized by a large number of HLA alleles including rare ones (Campbell et al. [137] ), or finding all possible epitopes binding to a specific allele even with low predicted affinity (Nerli et al. [147] ). These did not perform any refinement of the set of epitopes obtained using the peptide-HLA binding methods.

A number of the SARS-CoV-2 studies had a sizeable overlap among their predicted set of epitopes (Fig. 2) . This is not surprising since many had used similar computational pipelines (Table 1) . For example, the 4 studies based on exploiting SARS-CoV immunological data all relied on sequence similarity of SARS-CoV-2 epitopes with known SARS-CoV epitopes, resulting in considerable overlap among their predicted set of epitopes (Fig. 2) . Similarly, there was overlap among predictions of several studies based on peptide-HLA binding prediction (Fig. 2) , which mostly used a similar set of methods for epitope prediction (Fig. 3) . The ones that stand out among the latter group of studies for having a large overlap with multiple others include Feng et al. [152] , Campbell et al. [137] , and Nerli et al. [147] (Fig. 2) , which may be attributed to the large number of epitopes that they predicted.

The in silico predicted epitopes lie within each of the 12 SARS-CoV-2 proteins. The structural S protein, which is also one of the most immunogenic proteins of SARS-CoV-2, was the most commonly analyzed protein (Fig. 2) . Surprisingly, the other structural proteins (M, N and E), also reported to be immunogenic in SARS-CoV-2 [6] as well as SARS-CoV [173, 174] , were analyzed by only a few studies. The N protein, in particular, is highly expressed [115] , and the epitopes derived from this protein may be especially worthy of further experimental investigation. Compared to other proteins, the number of epitopes predicted for the longer proteins (ORF1a, ORF1b, and S) was much larger. As expected, the number of unique T cell epitopes predicted per protein across all studies was found to be strongly correlated with the length of protein (r = 0.99).

Similar numbers of unique HLA-I and HLA-II restricted epitopes were predicted by the SARS-CoV-2 studies listed in Table 1 (2,239 HLA-1 and 2,580 HLA-II), while a good number of epitopes (464) had missing (-NA‖) HLA class restriction. For the studies predicting epitopes exclusively based on peptide-HLA binding methods, it is surprising to observe epitopes having unknown HLA class restriction. This is because almost all prediction methods (discussed in Section 2.2.1) require specifying the HLA allele for predicting associated epitopes. However, a close inspection of the related studies revealed that this was due to either non-reporting of HLA allele restriction of the predicted epitopes (e.g., Sarkar et al. [161] ), or cases where only the number of unique HLA alleles associated with the predicted epitopes were reported without specifying the individual correspondences (e.g., Sanami et al. [163] ). As for the studies leveraging the immunological data of SARS-CoV from public databases, the SARS-CoV epitopes having no HLA class information were assigned the -NA‖ HLA class. Heatmap shows the fraction of common epitopes predicted across each pair of studies. The fraction is computed relative to the number of epitopes predicted by the study indicated in each row (the total number of epitopes predicted in each study are shown within parentheses on the right). Four in silico studies that used SARS-CoV immunological data are indicated in bold font. Of the epitopes predicted by these studies, only the ones predicted based on homology with SARS-CoV epitopes were included. Study labels indicated in the figure correspond to those in Table 1 .

Bar plots show the fraction of predicted epitopes for each HLA class in each study, with the total number shown within parentheses. (Bottom left panel) Heatmap shows the number of predicted epitopes derived from each SARS-CoV-2 protein for each in silico study. Each column in this heatmap corresponds to the study mentioned at the top of each column in the top left panel heatmap. Missing tiles indicate no predicted epitopes. (Bottom right panel) Bar plots show the fraction of predicted epitopes, across studies, derived from each SARS-CoV-2 protein, with the total number shown within parentheses. Predicted epitopes were assigned HLA class based on the HLA allele (bearing 4-digit resolution or higher) reported against them; or as -NA‖ otherwise. 

With a large number of studies predicting SARS-CoV-2 T cell epitopes using different computational pipelines (involving specific epitope prediction and refinement methods), it is difficult to assess their accuracy solely based on their performance in predicting epitopes of other organisms [175] . For this purpose, -ground truth‖ information of experimentally-determined SARS-CoV-2 T cell epitopes is required. This data has started to emerge from immunological assays that analyze immune responses in COVID-19 patients. As of 8 September 2020, we found eight experimental studies [8, 9, 115, [176] [177] [178] [179] [180] , reviewed in [181] , as well as an additional study [182] , that reported positive T cell immune responses from blood samples of convalescent COVID-19 patients against epitopes derived from SARS-CoV-2 proteins. Compiling data from these nine studies yielded a total of 324 (unique) epitopes. These 324 epitopes present a sampling (albeit not comprehensive) of the landscape of epitopes targeted by COVID-19 patients, and hence they provide an initial basis for which to conduct a comparative analysis of different in silico epitope prediction methods.

The nine experimental studies measured T cell responses after stimulation of the peripheral blood mononuclear cells (PBMCs) from convalescent COVID-19 patients. Of these, some studies obtained a set of peptides to synthesize using one or more of the in silico epitope prediction methods, which were then used to stimulate the PBMCs. Three such J o u r n a l P r e -p r o o f studies [8, 9, 180] used NetMHCpan-4.0, two studies [177, 182] used NetMHC4.0, while one study [115] used HLAThena and NeonMHC2 to select the set of peptides to synthesize. Two of these studies [8, 182] also investigated a few epitopes that were predicted in [20, 22] using SARS-CoV immunological data. This set of immunological studies demonstrates the application of multiple in silico T cell epitope prediction methods in guiding experimental investigations. Specifically, immune responses are measured against a reduced set of defined peptides predicted in silico which helps in identifying precise epitopes. In alternative experimental studies, stimulation was done using pools of overlapping k-mer peptides. One such study [176] synthesized pools of 15-mer overlapping peptides to measure T cell responses, while another study [179] synthesized pools of 15-18-mer overlapping peptides in addition to epitopes predicted using SARS-CoV immunological data [20] . While using pools of overlapping peptides does not reveal the precise -epitope‖ stimulating the T cell response, it still provides the information of immunogenic peptides encompassing the epitopes. Lastly, one study [178] employed a new experimental framework called T-Scan [183] to identify SARS-CoV-2 epitopes.

We compared the 324 experimentally-determined epitopes (including both precise epitopes and immunogenic peptides) against all unique 8-20 residue-long in silico predicted T cell epitopes (5273) (Fig. 2) . We found that 309 of the experimentally-determined epitopes encompassed at least a single predicted epitope, while 163 of these epitopes matched identically to predicted ones. Looking closely at those in silico studies that predicted at least one of the 163 identically-matched epitopes, we observed that studies which used SARS-CoV immunological data had collectively a higher hit rate (proportion of identically-matched epitopes in the set of predicted epitopes) (25%) as compared to the studies that used peptide-HLA binding prediction-based methods (3%). This difference in hit rate points to the usefulness of using SARS-CoV data to predict immune targets for SARS-CoV-2.

Most in silico prediction studies as well as experimental studies were biased towards certain HLA alleles. Together, the nine experimental studies reported T cell epitopes associated with 18 different HLA alleles, with the largest proportion of the reported epitopes (35/116) associated with the HLA-A*02:01 allele. This is not surprising since HLA-A*02:01 is the most prevalent HLA-I allele globally [184] . Similarly, roughly a third of the epitopes predicted by in silico methods were also associated with HLA-A*02:01. Thus, the experimentally-determined and predicted epitopes associated with HLA-A*02:01 represent a reasonable dataset for assessing the predictions of SARS-CoV-2 in silico studies. Altogether, 33 of the 35 experimentally-determined HLA-A*02:01-associated epitopes matched identically with predicted epitopes, and these were derived from five proteins (S, ORF1a, N, ORF3a and M) (Fig. 4) . Most of these epitopes (27/33) were reported to elicit a T cell response in multiple convalescent COVID-19 patients, with 10 of them eliciting responses in more than 5 patients. This suggests the potential immunodominance of these epitopes [185, 186] across the segment of population bearing the HLA-A*02:01 allele (Fig. 4) . Of these 33 epitopes, in silico studies based on peptide-HLA binding prediction and those leveraging SARS-CoV immunological data predicted 32 and 17, respectively. After grouping the in silico studies based on the peptide-HLA prediction tool used, those involving NetMHCpan-4.0 (despite the differences in refinement steps) appeared to predict most (30/33) of the epitopes (Table 3) . However, the method leveraging SARS-CoV immunological data had the highest hit rate in predicting experimentally-determined HLA-A*02:01-associated epitopes (14.8%) ( Table 3) .

A large fraction of these experimentally-determined epitopes (27/33) were predicted to promiscuously bind to multiple HLA alleles by in silico methods (Table 4 ). For example, the S -derived epitope 269 YLQPRTFLL 277 was collectively predicted by 12 in silico methods to bind to 43 HLA alleles in addition to HLA-A*02:01. This HLA promiscuity of the experimentally-determined epitopes predicted by in silico methods (Table 4 ) can help guide future experimental studies to identify the potential immunodominance of these epitopes across a large segment of the global population carrying different HLA alleles.

We used the compiled experimental SARS-CoV-2 data to assess the predictions of various refinement tools employed in SARS-CoV-2 in silico studies (Table 1) . We selected the refinement tools for which a web-server was available and used the recommended parameters on the web-server of each tool to analyze the experimentally-determined SARS-CoV-2 epitopes. This compiled data of experimentally-determined epitopes, obtained from positive T cell responses in convalescent COVID-19 patients, serves as a reasonable ground truth for the tools predicting immunogenicity.

For Vaxijen-2.0 [79] , the most commonly used tool by SARS-CoV-2 in silico studies for screening epitopes for immunogenicity (Fig. 3) , we obtained predictions for the experimentally-determined epitopes using the available webserver by selecting the organism as -viruses‖, as recommended by the authors. Our analysis revealed that Vaxijen-2.0 classified only ~56% (182/324) of the experimentally-determined epitopes as immunogenic (Fig. 5) . Unlike Vaxijen-2.0, Calis et al. [80] was developed for predicting immunogenicity of only HLA class I epitopes and does not perform binary classification. Instead, it provides a score to each epitope, with a high score representing high confidence in the epitope being immunogenic. We considered all epitopes predicted to have positive scores as immunogenic and vice versa, in accordance with the majority of SARS-CoV-2 in silico studies that used Calis et al. to predict epitope immunogenicity J o u r n a l P r e -p r o o f (Table 1) . Assessing the immunogenicity of 98 HLA class I experimentally-determined epitopes, Calis et al. predicted only ~63% (62/98) of them to be immunogenic (Fig. 5) . To investigate the performance of these methods in more detail, we tested their accuracy in predicting the top 10 HLA-A*02:01-associated immunodominant epitopes (Fig. 4) . In this case, Vaxijen-2.0 and Calis et al. predicted 30% (3/10) and 60% (6/10) of these to be immunogenic. Hence, the most commonly-used methods for immunogenicity prediction incorrectly classified over a third of the 328 experimentallydetermined epitopes as non-immunogenic. The accuracy of these methods does not appear to improve even for predicting the highly immunogenic epitopes, highlighting their suboptimal performance.

For a selected subset (20) of HLA-II restricted experimentally-determined epitopes, the recognizing CD4+ T cells were confirmed to be producing IFN- using flow cytometry [9] . We used this subset of epitopes to assess performance of the IFNepitope tool [85] commonly used in SARS-CoV-2 in silico studies (Fig. 3) . The predictions of IFNepitope for the selected experimentally-determined epitopes were obtained from the associated webserver using the recommended approach (motif and SVM hybrid) and model (IFN- vs non-IFN-). This analysis showed that IFNepitope correctly predicted only 40% (8/20) of the experimentally-determined IFN- producing SARS-CoV-2 epitopes (Fig. 5) .

In contrast to immunogenicity and IFN- production, no information was available regarding allergenicity and toxicity of the experimentally-determined epitopes from the immunological studies. Thus, no ground truth information is available to assess the performance of tools predicting these epitope characteristics. Nevertheless, we can still analyze the experimentally-determined epitopes using these tools and, at least for the case of allergenicity prediction tools, compare their relative predictions. We used the default parameter settings in the webservers for all allergenicity and toxicity prediction tools. Our analysis showed that AllerTOP-2.0 [93] and AllergenFP [94] , the two most commonly used tools for determining allergenicity (Fig. 3) , predicted a high fraction ~43% (142/328 and 140/328, respectively) of the experimentally-determined epitopes to be allergenic, with less than half of these (66) being commonly predicted by both methods. Hence, the variation in predictions of these methods was high. This disparity was even more evident for the AllerCatPro method [95] , which predicted all experimentally-determined epitopes to be non-allergenic (Fig. 5) . Hence, due to the wide variation in predictions and a lack of experimental data to validate them, the practical applicability of the allergenicity tools for SARS-CoV-2 remains unclear and further investigation is required. Lastly, in terms of toxicity prediction, ToxinPred [98] was the only tool used to predict toxicity of epitopes by the SARS-CoV-2 in silico studies (Table 1 ). Our analysis revealed that it predicted 98% (322/328) of the experimentally-determined epitopes to be nontoxic. However, similar to the case of allergenicity predicting tools, evaluating the accuracy of these toxicity predictions is not possible at present due to a lack of relevant experimental information. Table 3 . Approaches adopted by in silico studies that predicted at least half of the experimentally-determined HLA-A*02:01-associated epitopes.

In silico studies Approach Total number of predicted epitopes

Hit rate 1 1 Nerli2020 [147] , Wang2020 [114] , Bhatnager2020 [131] Based on peptide-HLA binding prediction (involving NetMHCpan-4.0)

Ahmed2020 [20] , Grifoni2020 [22] , Ranga2020 [23] , Lee2020 [21] Using SARS-CoV immunological data 115 17 14.8% (Table 1) predicting each epitope are listed on the right. Epitopes are colored according to the SAS-CoV-2 protein from which they are derived (counts shown in legend) and ordered in descending order of the number of patients whose samples responded. The two experimentally-determined HLA-A*02:01 epitopes which did not match identically with any epitope predicted by in silico studies were 906 YLFDESGEFKL 916 in ORF1a and 20 FLAFVVFL 27 in E. These epitopes were reported to induce a T cell response in 9/36 and 2/3 COVID-19 convalescent patients, respectively.

J o u r n a l P r e -p r o o f Table 4 . Distinct HLA alleles predicted, across in silico studies, to be associated with the 33 experimentally-determined HLA-A*02:01-restricted SARS-CoV-2 epitopes.

In silico prediction methods (count) 

Results obtained for the 324 experimentally-determined SARS-CoV-2 T cell epitopes [8, 9, 115, [176] [177] [178] [179] [180] 182] when they were provided as input to the computational tools most commonly used in the reviewed SARS-CoV-2 in silico studies for refinement of epitopes obtained by peptide-HLA binding prediction methods. Positive outcomes indicate the number of epitopes the computational tool predicts to have the characteristic (immunogenicity, IFN- production, allergenicity, toxicity) being tested, and vice versa for negative outcomes. -NA‖ indicates the number of epitopes that could not be analyzed by the specific tool. In case of Calis et al. [80] , the method is applicable to HLA-I epitopes only, while in the case of IFNepitope, this was because only a subset of the experimentally-determined epitopes had IFN- production information available.

J o u r n a l P r e -p r o o f

In silico epitope identification is an important component in the vaccine development pipeline as it provides recommendations for immune targets that may be exploited by vaccine designs. It is also very helpful for guiding immunological assays designed to understand T cell responses elicited by vaccines or those mounted naturally against COVID-19 infections. This review has made vivid the large amount of work that has been done already in predicting and analyzing epitopes of SARS-CoV-2, with a focus on T cells. The 65 studies that we have reviewed employed different computational approaches, along with an impressive array of computational tools.

The aim of this review was not only to summarize the methods that have been employed so far, but also to provide a comparative analysis of their epitope predictions, as well as to offer insights into the performance of the different approaches. The ability to test prediction accuracy hinges on the availability of experimental ground truth data, which is rapidly evolving but still remains limited. Data limitations precluded, for example, the performance evaluation of tools that predict epitope safety features. However, data from nine independent immunological studies of convalescent COVID-19 patients provided a set of T cell epitopes that offered a means to test the basic ability of the in silico T cell prediction methods for identifying SARS-CoV-2 epitopes reported to be immunogenic.

The fact that the large majority (>95%) of the experimentally-determined epitopes for HLA-A*02:01 were identical to an epitope predicted by at least one in silico method offers strong evidence for the practical significance of these methods in identifying immunogenic T cell epitopes in the context of SARS-CoV-2. While the comparison carried out here cannot ascertain which prediction method performed better (given that many studies differed in their prediction refinement step and objectives), we could still compare the hit rates of in silico studies grouped according to their common underlying approach for predicting SARS-CoV-2 epitopes (Table 3 ). This analysis showed that for HLA-A*02:01 (the HLA allele with the largest number of epitopes available), the two approaches with the highest hit rates in predicting the set of experimentally-determined SARS-CoV-2 epitopes were peptide-HLA binding prediction using NetMHCpan-4.0 and the approach that leveraged SARS-CoV immunological data. Hence, these distinct approaches both appear to be well supported for their further use in guiding additional epitope identification for SARS-CoV-2, and for their application to identify epitopes for other viruses.

Our analysis has provided insights into the computational tools that have been used by a number of SARS-CoV-2 in silico studies to further refine the set of predicted epitopes based on specific features. Most notably, the observation that the most commonly used tools for predicting epitope immunogenicity identified almost one-third of the experimentallydetermined immunogenic epitopes to be non-immunogenic points to their suboptimality in relation to SARS-CoV-2. Similarly, the performance of the tool that has been used for screening HLA class II epitopes for inducing IFN- production was also found to be suboptimal. It should be recognized however, that these in silico screening tools are general-purpose tools that were developed more than seven years ago. For SARS-CoV-2, there appears to be significant room for improvement, and more specialized tools (e.g., Gao et al. [82] ) may be more effective. While several of the surveyed studies used in silico tools to assess the safety (allergenicity, toxicity) of SARS-CoV-2 epitopes, the accuracy of these predictions could not be validated due to the lack of experimental data. Discordance in the prediction of some of these safety assessment tools (Section 4) also highlights the need for further research and systematic experimental validation. This is an important research direction since the utility of such in silico tools in pre-clinical trials is gaining recognition by both regulatory bodies and funding agencies [187, 188] , as it is in line with the principles of 3Rs (replacement, reduction, refinement) for humane animal research.

Several experimentally-determined epitopes associated with HLA-A*02:01 appear to be immunodominant across multiple convalescent COVID-19 patients. Interestingly, the majority of these epitopes were predicted to have promiscuous HLA association by multiple methods. This suggests that vaccines designed to target such epitopes have the potential to provide high population coverage. However, the promiscuity of these epitopes remains to be verified experimentally, and this would appear to be an important direction for future studies.

In this review we have focused on T cells, which form one arm of the adaptive immune system. The other arm, comprising antibodies produced by B cells, is also important for preventing viral infection. In fact, recent experimental studies have suggested that protection against SARS-CoV-2 may be mediated collectively by both T cells and antibodies [11] . There have been extensive efforts in characterizing the neutralizing antibodies against SARS-CoV-2 [3-5], as well as in identifying the B cell epitopes which may be targeted by neutralizing antibodies. Of the in silico SARS-CoV-2 studies that have been reviewed (Table 1) , many had also predicted B cell epitopes for potentially eliciting a neutralizing antibody response. Some of these epitope predictions, particularly those made by methods leveraging SARS-CoV data [20, 22] , have also been observed experimentally [35, [189] [190] [191] [192] [193] . The development of in silico methods to identify B cell epitopes for SARS-CoV-2 is currently an active area of research. Like for the case of T cells, ML methods may also be considered for predicting B cell epitopes (e.g., [194] ), however developing predictive models for B J o u r n a l P r e -p r o o f cells is more complicated since the predicted epitopes must fold into conformations that are similar to the native protein for eliciting an antibody response.

For SARS-CoV-2, as well as other coronaviruses, one feature that simplifies the identification of potentially robust epitopes is the fact that the genetic variation is quite low. For example, almost all (~99%) of the epitopes that were predicted by a study in early 2020 [24] are still highly conserved (>99%) within SARS-CoV-2 sequences [35], despite a three orders of magnitude increase in the amount of sequence data available. Hence, based on our current understanding, T cell escape by genetic variation may not be a significant factor for SARS-CoV-2. This is in contrast to other viruses that are highly mutable, such as HIV and hepatitis C virus, for which more elaborate computational methods have been developed to facilitate robust T cell epitope identification and to aid vaccine design [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] .

Generally speaking, the knowledge being gained through the broad application of in silico T cell epitope prediction methods and tools to SARS-CoV-2 can help guide further studies aimed at epitope determination and vaccine design for various other viruses. For example, the observed cross-reactivity of T epitopes between SARS-CoV and SARS-CoV-2 motivate studies that seek to identify epitopes that are genetically similar across a spectrum of coronaviruses (e.g., SARS-CoV, MERS-CoV, SARS-CoV-2, common cold human coronaviruses, as well as animal coronaviruses). The identification of such epitopes could help guide -pan-coronavirus‖ vaccine designs that are aimed at safeguarding against both current human coronaviruses and novel coronaviruses that may leap from other species to infect humans in the future [208, 209] . An increased understanding of the landscape of SARS-CoV-2 immunogenic T cell epitopes targeted by COVID-19 patients would open up the space of possibilities to explore, and this could play an important role in the search for a pan-coronavirus vaccine.

The authors were supported by the General Research Fund J o u r n a l P r e -p r o o f

Key roles of adjuvants in modern vaccines

Recent advances in subunit vaccine carriers

IPD-IMGT/HLA database

NetMHCpan-4.0: Improved peptide-MHC class I interaction predictions integrating eluted ligand and peptide binding affinity data

Mass spectrometry profiling of HLA-associated peptidomes in mono-allelic cells enables more accurate epitope prediction

NetMHCpan-4.1 and NetMHCIIpan-4.0: Improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data

Defining HLA-II ligand processing and binding rules with mass spectrometry enhances cancer epitope prediction

Preliminary identification of potential vaccine targets for the COVID-19 coronavirus (SARS-CoV-2) based on SARS-CoV immunological studies

In silico identification of vaccine targets for 2019-nCoV

A sequence homology and bioinformatic approach can predict candidate targets for immune responses to SARS-CoV-2

Immunogenic SARS-CoV-2 epitopes: In silico study towards better understanding of COVID-19 disease-paving the way for vaccine development

Genomic characterisation and epidemiology of 2019 novel coronavirus: Implications for virus origins and receptor binding

A pneumonia outbreak associated with a new coronavirus of probable bat origin

Memory T cell responses targeting the SARS coronavirus persist up to 11 years post-infection

Heterologous immunity between viruses

Identification of Zika virus epitopes reveals immunodominant and protective roles for dengue virus cross-reactive CD8+ T cells

The immune response against flaviviruses

Cross-serotypically conserved epitope recommendations for a universal T cell-based dengue vaccine

The role of the proteasome in generating cytotoxic T-cell epitopes: Insights obtained from improved predictions of proteasomal cleavage

Large-scale validation of methods for cytotoxic T-lymphocyte epitope prediction

Identifying MHC class I epitopes by predicting the TAP transport efficiency of epitope precursors

Prediction of CTL epitopes using QM, SVM and ANN techniques

Prediction of MHC class I binding peptides, using SVMHC

Amino acid substitution matrices from protein blocks

ProPred1: Prediction of promiscuous MHC Class-I binding sites

ProPred: Prediction of HLA-DR binding sites

PREDIVAC: CD4+ T-cell epitope prediction for vaccine design that covers 95% of HLA class II DR protein diversity

Vaxign: The first web-based vaccine design program for reverse vaccinology and applications for vaccine development

Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method

Examining the independent binding assumption for binding of peptide epitopes to MHC-I molecules

Prediction of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment method

Generation of tissue-specific and promiscuous HLA ligand databases using DNA microarrays and virtual HLA class II matrices

MHCPred: A server for quantitative prediction of peptide-MHC binding

A consensus epitope prediction approach identifies the breadth of murine TCD8+-cell responses to vaccinia virus

A systematic assessment of MHC class II peptide binding predictions and evaluation of a consensus approach

PVACtools: A computational toolkit to identify and visualize cancer neoantigens

TepiTool: A pipeline for computational prediction of T cell epitope candidates

Application of machine learning techniques in predicting MHC binders

Immunoproteasome subunit deficiencies impact differentially on two immunodominant influenza virus-specific CD8+ T cell responses

Protective immunity does not correlate with the hierarchy of virus-specific cytotoxic T cell responses to naturally processed peptides

HLA class I-restricted responses to vaccinia recognize a broad array of proteins mainly involved in virulence and viral gene regulation

Construction and immunogenicity of a recombinant fowlpox virus containing the capsid and 3C protease coding regions of footand-mouth disease virus

Kuby Immunology

CD8+ T cells: Foot soldiers of the immune system

Thymic selection of T-cell receptors as an extreme value problem

Effects of thymic selection of the T-cell repertoire on HLA class I-associated control of HIV infection

Recognition of HIV-1 peptides by host CTL Is related to HIV-1 similarity to human proteins

Restricted autoantigen recognition associated with deletional and adaptive regulatory mechanisms

Degenerate T-cell recognition of peptides on MHC molecules creates large holes in the T-cell repertoire

VaxiJen: A server for prediction of protective antigens, tumour antigens and subunit vaccines

Properties of MHC class I presented peptides that enhance immunogenicity

Exploring the pre-immune landscape of antigen-specific T cells

Predicting the immunogenicity of T cell epitopes: From HIV to SARS-CoV-2

Identification and validation of 174 COVID-19 vaccine candidate epitopes reveals low performance of common epitope prediction tools

Interferon-γ: An overview of signals, mechanisms and functions

Designing of interferon-gamma inducing MHC class-II binders

Structural basis and functional analysis of the SARS coronavirus nsp14-nsp10 complex

Tracking changes in SARS-CoV-2 Spike: Evidence that D614G increases infectivity of the COVID-19 virus

Vaccine confidence plummets in the Philippines following dengue vaccine scare: Why it matters to pandemic preparedness

Addressing the vaccine confidence gap

Food allergy

AlgPred: Prediction of allergenic proteins and mapping of IgE epitopes

AllerTOP -a server for in silico prediction of allergens

AllerTOP v.2 -a server for in silico prediction of allergens

AllergenFP: Allergenicity prediction by descriptor fingerprints

AllerCatPro-prediction of protein allergenicity potential from the protein sequence

Immunoinformatic identification of B cell and T cell epitopes in the SARS-CoV-2 proteome

The classification and properties of toxic hazards, in: Toxic Trauma

In silico approach for predicting toxicity of peptides and proteins

UniProt: The universal protein knowledgebase

A History of Modern Immunology

Immunobiology: The Immune System in Health and Disease, Garland Science

Molecular mimicry: Its evolution from concept to mechanism as a cause of autoimmune diseases

Consequence or coincidence? The occurrence, pathogenesis and significance of autoimmune manifestations after viral vaccines

Causal relationship between immunological responses and adverse reactions following vaccination

The key role of genomics in modern vaccine and drug design for emerging infectious diseases

Pathogen proteins eliciting antibodies do not share epitopes with host proteins: A bioinformatics approach

Peptide cross-reactivity: The original sin of vaccines

Computationally optimized SARS-CoV-2 MHC class I and II vaccine formulations predicted to target human haplotype distributions

New vaccine technologies to combat outbreak situations

Emerging concepts and technologies in vaccine development

Emerging vaccine technologies

Immunoinformatics-aided identification of T cell and B cell epitopes in the surface glycoprotein of 2019-nCoV

High throughput and comprehensive approach to develop multiepitope vaccine against minacious COVID-19

Immunoinformatic analysis of Tand B-cell epitopes for SARS-CoV-2 vaccine design

Sequence-based prediction of SARS-CoV-2 vaccine targets using a mass spectrometry-based bioinformatics predictor identifies immunogenic T cell epitopes

Designing of a next generation multiepitope based vaccine (MEV) against SARS-COV-2: Immunoinformatics and in silico approaches

CoronaVR: A computational resource and analysis of epitopes and therapeutics for severe acute respiratory syndrome coronavirus-2

Reverse vaccinology approach to design a novel multi-epitope vaccine candidate against COVID-19: an in silico study

COVID-19 coronavirus vaccine design using reverse vaccinology and machine learning

Design of a multiepitope-based peptide vaccine against the E protein of human COVID-19: An immunoinformatics approach

Frenkel-Morgenstern, Immunoinformatics and structural analysis for identification of immunodominant epitopes in SARS-CoV-2 as potential vaccine targets, Vaccines

Understanding the B and T cell epitopes of spike protein of severe acute respiratory syndrome coronavirus-2: A computational way to predict the immunogens

Design of a novel multi epitope-based vaccine for pandemic coronavirus disease (COVID-19) by vaccinomics and probable prevention strategy against avenging zoonotics

Designing multi-epitope vaccines to combat emerging coronavirus disease 2019 (COVID-19) by employing immuno-informatics approach

Bioinformatics analysis of epitope-based vaccine design against the novel SARS-CoV-2

A rational design of a multi-epitope vaccine against SARS-CoV-2 which accounts for the glycan shield of the spike glycoprotein

Contriving multi-epitope subunit of vaccine for COVID-19: Immunoinformatics approaches

An in-silico approach to develop of a multiepitope vaccine candidate against SARS-CoV-2 envelope (E) protein, Res. Sq. (2020) rs.3.rs-30374

Immuno-informatics approach for multi-epitope vaccine designing against SARS-CoV-2, BioRvix

Designing a multi-epitope vaccine against SARS-CoV-2: An immunoinformatics approach

Epitope based peptide vaccine against SARS-COV2: an immuneinformatics approach

In silico designing of multi-epitope vaccine construct against human coronavirus infections

Design of multi-epitope vaccine candidate against SARS-CoV-2: A in-silico study

Immunoinformatics characterization of SARS-CoV-2 spike glycoprotein for prioritization of epitope based multivalent peptide vaccine

3CL hydrolase-based multiepitope peptide vaccine against SARS-CoV-2 using immunoinformatics

Structurebased drug designing and immunoinformatics approach for SARS-CoV-2

Prioritization of SARS-CoV-2 epitopes using a pan-HLA and global population inference approach

Immunoinformatic analysis of the SARS-CoV-2 envelope protein as a strategy to assess crossprotection against COVID-19

In the search of potential epitopes for Wuhan seafood market pneumonia virus using high order nullomers

Expected immune recognition of COVID-19 virus by memory from earlier infections with common coronaviruses in a large part of the world population

Insights into cross-species evolution of novel human coronavirus 2019-nCoV and defining immune determinants for vaccine development

Identification of potential vaccine candidates against SARS-CoV-2, A step forward to fight novel coronavirus 2019-nCoV: A Reverse Vaccinology Approach

Structural basis for designing multiepitope vaccines against COVID-19 infection: In silico vaccine design and validation

Multi-epitope-based peptide vaccine design against SARS-CoV-2 using its spike protein

Designing a multi-epitope peptide-based vaccine against SARS-CoV-2

In silico approach for designing of a multi-epitope based vaccine against novel Coronavirus (SARS-COV-2)

Structure-based modeling of SARS-CoV-2 peptide/HLA-A02 antigens

Design of an epitope-based peptide vaccine against the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2): A vaccine informatics approach

Energetics and IC50 based epitope screening in SARS CoV-2 (COVID 19) spike protein by immunoinformatic analysis implicating for a suitable vaccine development

Design of an epitope-based synthetic long peptide vaccine to counteract the novel China coronavirus (2019-nCoV)

Genome based evolutionary study of SARS-CoV-2 towards the prediction of epitope based chimeric vaccine

Multiepitope vaccine design using an immunoinformatics approach for 2019 novel coronavirus in China (SARS-CoV-2)

Development of epitope-based peptide vaccine against novel coronavirus 2019 (SARS-COV-2): Immunoinformatics approach

Excavating SARS-coronavirus 2 genome for epitope-based subunit vaccine synthesis using immunoinformatics approach

Potential T-cell and B-cell epitopes of 2019-nCoV

Epitope based vaccine prediction for SARS-COV-2 by deploying immuno-informatics approach

A candidate multi-epitope vaccine against SARS-CoV-2

Structural modeling and conserved epitopes prediction against SARS-COV-2 structural proteins for vaccine development

Designing a novel mRNA vaccine against SARS-CoV-2: An immunoinformatics approach

Bioinformatic prediction of potential T cell epitopes for SARS-Cov-2

Immunoinformatics-guided designing of epitope-based subunit vaccines against the SARS Coronavirus-2 (SARS-CoV-2)

Prediction of SARS-CoV2 -associated susceptibility

Design of a multi-epitope vaccine against SARS-CoV-2 using immunoinformatics approach

Design of a peptide-based subunit vaccine against novel coronavirus SARS-CoV-2

Vaccine design from the ensemble of surface glycoprotein epitopes of SARS-CoV-2: An immunoinformatics approach, Vaccines

Epitope-based peptide vaccines predicted against novel coronavirus disease caused by SARS-CoV-2

Identification of SARS-CoV-2 vaccine epitopes predicted to induce long-term population-scale immunity

Design an efficient multi-epitope peptide vaccine candidate against SARS-CoV-2: An in silico analysis

Epitopes for a 2019-nCoV vaccine

Definitions of histocompatibility typing terms

Peptide length significantly influences in vitro affinity for MHC class II molecules

The length distribution of class I-restricted T cell epitopes is determined by both peptide supply and MHC allele-specific binding preference

T cell responses to whole SARS coronavirus in humans

Virus-specific memory CD8 T cells provide substantial protection from lethal severe acute respiratory syndrome coronavirus infection

Comparison of experimental fine-mapping to in silico prediction results of HIV-1 epitopes reveals ongoing need for mapping experiments

SARS-CoV-2-specific T cell immunity in cases of COVID-19 and SARS, and uninfected controls

Shared antigen-specific CD8+ T cell responses against the SARS-COV-2 spike protein in HLA A*02:01 COVID-19 participants

Unbiased screens show CD8+ T cells of COVID-19 patients recognize shared epitopes in SARS-CoV-2 that largely reside outside the spike Protein

Broad and strong memory CD4+ and CD8+ T cells induced by SARS-CoV-2 in UK convalescent individuals following COVID-19

SARS-CoV-2 epitopes are recognized by a public and diverse repertoire of human T cell receptors

Epitopes targeted by T cells in convalescent COVID-19 patients

Characterization of preexisting and induced SARS-CoV-2-specific CD8+ T cells

T-Scan: A genome-wide method for the systematic discovery of T cell epitopes

Allele frequency net: A database and online repository for immune gene frequencies in worldwide populations

Human T cell response to dengue virus infection

The immune epitope database and analysis resource in epitope discovery and synthetic vaccine design

Harnessing the power of novel animal-free test methods for the development of COVID-19 drugs and vaccines

FDA, Development and licensure of vaccines to prevent COVID-19

Linear B-cell epitopes in the spike and nucleocapsid proteins as markers of SARS-CoV-2 exposure and disease severity

Linear epitope landscape of SARS-CoV-2 spike protein constructed from 1,051 COVID-19 patients

Cross-neutralization of SARS-CoV-2 by a human monoclonal SARS-CoV antibody

Two linear epitopes on the SARS-CoV-2 spike protein that elicit neutralising antibodies in COVID-19 patients

Immunodominant epitopes based serological assay for detecting SARS-CoV-2 exposure: Promises and challenges

BepiPred-2.0: Improving sequence-based B-cell epitope prediction using conformational epitopes

Rational design of vaccine targets and strategies for HIV: A crossroad of statistical physics, biology, and medicine

Structural topology defines protective CD8+ T cell epitopes in the HIV proteome

MPL resolves genetic linkage in fitness inference from complex evolutionary histories

Identifying immunologically-vulnerable regions of the HCV E2 glycoprotein and broadly neutralizing antibodies that target them

Deconvolving mutational patterns of poliovirus outbreaks reveals its intrinsic fitness landscape

Sub-dominant principal components inform new vaccine targets for HIV Gag

How genetic sequence data can guide vaccine design

Translating HIV sequences into quantitative fitness landscapes predicts viral vulnerabilities for rational immunogen design

Co-evolution networks of HIV/HCV are modular with direct association to structure and function

Statistical linkage analysis of substitutions in patient-derived sequences of genotype 1a hepatitis C virus nonstructural protein 3 exposes targets for immunogen design

Coordinate linkage of HIV evolution reveals regions of immunological vulnerability

Computational design of hepatitis C virus immunogens from host-pathogen dynamics over empirical viral fitness landscapes

Fitness landscape of the human immunodeficiency virus envelope protein that is targeted by antibodies

Origin and cross-species transmission of bat coronaviruses in China

A SARS-like cluster of circulating bat coronaviruses shows potential for human emergence