key: cord-1034406-6aht3z7w authors: Mishra, Seema title: Designing of cytotoxic and helper T cell epitope map provides insights into the highly contagious nature of the pandemic novel coronavirus SARS-CoV-2 date: 2020-09-16 journal: R Soc Open Sci DOI: 10.1098/rsos.201141 sha: a0d4470e384ae1aaa830973ffb98e3326da19354 doc_id: 1034406 cord_uid: 6aht3z7w Novel coronavirus, SARS-CoV-2, has emerged as one of the deadliest pathogens of this century, creating an unprecedented pandemic. Belonging to the betacoronavirus family, it primarily spreads through human contact via symptomatic and asymptomatic transmission. Despite several attempts since it emerged, there is no known treatment in the form of drugs or vaccines. Hence, work on developing a potential multi-subunit vaccine is the need of the hour. In this study, attempts have been made to find globally conserved epitopes from the entire set of SARS-CoV-2 proteins as there is as yet, no clear information on the immunogenicity of these proteins. Using diverse computational tools, a ranked list of probable immunogenic, promiscuous epitopes generated through all the three main stages of antigen processing and presentation pathways has been prioritized. Moreover, several useful insights were gleaned during these analyses. One of the most important insights is that all of the proteins in this pathogen present unique epitopes, so that the targeting of a few specific viral proteins is not likely to result in an effective immune response in humans. Due to the presence of these unique epitopes in all of the SARS-CoV-2 proteins, stronger immune responses generated by T cell hyperactivation may lead to cytokine storm and immunopathology and consequently, remote chances of human survival. These epitopes, after due validation in vitro, may thus need to be presented to the human body in that form of multi-subunit epitope-based vaccine that avoids such immunopathologies. Novel coronavirus, SARS-CoV-2, has emerged as one of the deadliest pathogens of this century, creating an unprecedented pandemic. Belonging to the betacoronavirus family, it primarily spreads through human contact via symptomatic and asymptomatic transmission. Despite several attempts since it emerged, there is no known treatment in the form of drugs or vaccines. Hence, work on developing a potential multi-subunit vaccine is the need of the hour. In this study, attempts have been made to find globally conserved epitopes from the entire set of SARS-CoV-2 proteins as there is as yet, no clear information on the immunogenicity of these proteins. Using diverse computational tools, a ranked list of probable immunogenic, promiscuous epitopes generated through all the three main stages of antigen processing and presentation pathways has been prioritized. Moreover, several useful insights were gleaned during these analyses. One of the most important insights is that all of the proteins in this pathogen present unique epitopes, so that the targeting of a few specific viral proteins is not likely to result in an effective immune response in humans. Due to the presence of these unique epitopes in all of the SARS-CoV-2 proteins, stronger immune responses generated by T cell hyperactivation may lead to cytokine storm and immunopathology and consequently, remote chances of human survival. These epitopes, after due validation in vitro, may thus need to be presented to the human body in that form of multi-subunit epitope-based vaccine that avoids such immunopathologies. Novel coronavirus (SARS-CoV-2), also known as 2019-nCoV, first emerged in population in December 2019 and has rapidly gained foothold across the world resulting in WHO declaring it as a pandemic (https://www.who.int/emergencies/diseases/novel-coronavirus-2019). It causes COVID-19 disease with significant mortality rate. As there is currently no known cure, urgent studies are needed in order to push forward drug and vaccine design and development. Recently, about 77 drugs were identified by the world's fastest supercomputer, Summit, against viral spike protein [1]. Immunoinformatics tools have proven crucial time and again in relation to cancer immunotherapy [2, 3] . In the absence of effective drugs to date, vaccination is indispensable in order to prevent infections or cure an entire population. As of 15 May 2020, WHO has put forward a draft which identifies eight vaccines in clinical evaluations and 110 candidate vaccines in preclinical evaluations (updated to 26 vaccines in clinical evaluations and 139 candidate vaccines in preclinical evaluations as of 31 July 2020) [4] . More important is the fact that since this COVID-19 disease has affected almost all of the world's population, the vaccine coverage needs to be extensive. In the context of HLA epitope-based multi-subunit vaccine, enlisting promiscuous epitopes binding to a variety of HLA alleles for wider dissemination is crucial. A promiscuous epitope is defined as that epitope which has the capability to bind to multiple HLA alleles. In this regard, in silico approaches will be significantly useful in helping develop a preventive approach or a cure in as fast a manner as possible. Vaccines can be administered as prophylactic and even as therapeutic vaccines; as an example, anti-HBV vaccines are currently being developed as therapeutic vaccine candidates. Cytotoxic T cell immune responses have been observed in close relatives, SARS and MERS [5, 6] , and hence, in SARS-CoV-2 case also, cytotoxic T cell-coordinated immune response along with helper T cell response is crucial. Based on the newly available SARS-CoV-2 genome sequence, this study has been embarked upon with the clear objective of providing a ranked list of highly probable and effective promiscuous epitopes with no human cross-reactivity. Interestingly, several useful insights into the deadly nature of this pathogen were also gleaned along the way. SARS-CoV-2 genome submitted by CDC, Atlanta (GenBank accession number: MT106054.1 submitted on 24 February 2020) is 29 882 bp in length. Being 100% identical to the reference sequence NC_045512.2 from Wuhan, China, it harbours multiple structural, non-structural and accessory proteins essential or playing a role at various stages of the viral life cycle. This SARS-CoV-2 genome is found 82.3% identical to SARS-CoV genome (NC_004718.3), using NCBI BLASTn tool. In brief, the sequence of proteins in its RNA genome as per this GenBank accession information (figure 1) is as follows: 5 0 -ORF1ab-S (Spike/ Surface)-ORF3a-E (Envelope)-M (Membrane)-ORF6-ORF7a-ORF8-N (Nucleocapsid)-ORF10-polyA tail-3 0 , which are usually seen in betacoronaviruses [8] . While the structural proteins, S, E, M and N, are key proteins, several proteins such as ORF3a, ORF7a and ORF8 function as accessory proteins playing a role in the viral pathogenesis. ORF1ab, a polyprotein, encodes several non-structural proteins, 15 in number, identified in this genome sequence annotation, including RNA-dependent RNA polymerase (RdRP). The role of structural proteins is determined from their homology to SARS-CoV as well as a few experiments [9] . The expression, localization and function of some SARS-CoV-2 accessory proteins is as yet unclear, although several such proteins have been characterized in SARS-CoV [10] , and the roles may be similar in the two viruses. Recently, Gordon et al. [11] have cloned and expressed several of these proteins including S, E, M, N, ORF1ab non-structural proteins and accessory proteins ORF3a, ORF6, ORF7a, ORF8 and ORF10. Sequencing studies suggested that the most abundant transcript was N RNA followed by S, ORF7a, ORF3a, ORF8, M, E, ORF6 and ORF7b [7] ; ORF7b is identified in this paper [7] ). As understood from WHO draft, candidate subunit vaccines are almost all based on spike proteins and very few ones are based on M and N proteins. In view of the scarcity of data on the relevance, immunogenicity and potency/effectiveness of these proteins, any one or more of these proteins may act as prime vaccine candidates. Hence, all of these proteins were used for T cell epitope prediction for the purpose of peptide-based multi-subunit vaccine design and further analyses. The fact that this approach may be better also arises from the previous studies on related SARS-CoV virus [12] , wherein more than 50% of the patients had T cell responses against at least one of the two proteins tested, and 25% showed responses against both proteins. The advantages of epitope-based subunit vaccines as opposed to DNA and live attenuated virus vaccines is that these do not contain live components and so are considered safe. Moreover, these present an antigen or a set of antigens to the immune system with lower risk of side effects [13] . These are also applicable to those people with weakened immune response, which the old people have, and are, therefore, prime targets in the SARS-CoV-2 infection. While cytotoxic T cell (CD8+) response is the key response to immunodominant antigens in destroying a virus-infected cell, helper T royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 7: 201141 cells (CD4+) prime and maintain cytotoxic T cells as well as B cells and so, an effective immunotherapeutic product must contain both types of T cell epitopes. These T cell epitopes need to be both high binders to their respective HLA alleles as well as be immunogenic. Further analyses using clustering provided us with consensus epitopes harbouring both CD8+ and CD4+ T cell epitopes, thereby eliminating redundant sequences across target proteins and alleles. These clustered epitopes could elicit stronger cellular immune responses to viral proteins. As opposed to the common perception that membrane and spike proteins may confer better immunogenic ability, an interesting perception is found from this study that it may be the opposite case in the context of SARS-CoV-2 T cell epitopes when studied across populations with different HLA-I supertypes. It should be noted that antibody responses may preferentially target membrane and spike proteins, given their locations on the virus surface, and that this current analysis is geared towards T cell epitopes. 2.1. Promiscuous, immunogenic Cytotoxic T lymphocyte (CTL) epitopes To predict potential CTL epitopes against whole coronavirus proteome, predicted proteins or otherwise, CTL epitope prediction was done using PickPocket 1.1 and NetCTLpan 1.1, using the same HLA supertypes. In total, 12 representative supertypes present by default in both the tools were taken. These supertypes are present across populations, and hence, are a representative of the entire world. Further, these two prediction algorithms were used to predict and generate a consensus list of top high binders and promiscuous epitopes across several proteins and supertypes. The consensus list was chosen to increase the prediction accuracy from the two different algorithms. While NetCTLpan uses neural network algorithm, PickPocket works on the basis of position-specific weight matrices. NetCTLpan, in addition to HLA binding, also predicts TAP transporter binding and C-terminal proteasome cleavage predictions. The total number of CTL epitopes generated was 9621 across 10 SARS-CoV-2 proteins including ORF1ab polyprotein. A common list of nine amino acids-long, high binders was generated among topmost epitopes in each protein for each allele, and a total of 122 epitopes were enlisted. These common, promiscuous CTL epitopes are enlisted in tables 1 and 2 as ranked order. It is found that very few promiscuous epitopes could be seen in the case of surface and membrane proteins in topmost epitopes common to both the prediction algorithms [14] . These proteins harbour many potential, unique epitopes across the two prediction tools, leading to the surmise that these two proteins will not be potent, promiscuous immunogens across populations. Nevertheless, a few common promiscuous epitopes across prediction algorithms, although not belonging to the top-ranked ones, were enlisted for these two proteins. One of these epitopes, FVFLVLLPL, the signal peptide in surface/spike protein, has been found to harbour a mutation, L5F, in many strains of 13 countries in distinct phylogenetic clades and L8 V/W mutation is present in Hong Kong [15] . These authors further suggest that L5F mutation might be a sequencing artefact, or may be due to recurrent homoplasy. Epitopes belonging to the spike protein enlisted here do not harbour this residue in the sequences, the D614G mutation, said to be the dominant form in variants in Europe and India. The highest number of common top-ranking epitopes is seen in the case of nsp7 of ORF1ab followed by ORF10, ORF8, ORF6 and ORF3a proteins. Among structural proteins, envelope protein provided the highest number of such epitopes. Venn diagram analysis depicted no nlm.nih.gov/datasets/). ORF7b was identified later in a paper [7] . royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 7: 201141 common epitopes at all across proteins and alleles. Even though SARS-CoV-2 RBD (331-527) is shown to harbour epitopes for eliciting neutralizing antibodies [16, 17] , this region is not present in the enlisted data for CTL epitopes. However, receptor-binding motif (RBM) region (437-508), the ACE-2 binding motif of this RBD provided immunogenic HTL epitopes, which are detailed below in the section on promiscuous HTL epitopes. Immunogenicity prediction of these proteins (table 3) showed that 71 of these 122 epitopes had a positive immunogenicity score. A clear correlation between HLA binding and immunogenicity in terms of high scores is seen in many of these cases, lending support to the theory that these selected epitopes may mount a high immune response in vitro and in vivo, too. Further, conserved residues between SARS-CoV-2 and other HCoV and MERS species were found from multiple sequence alignments (MSAs) and found in several of these epitopes (electronic supplementary material, figures S1-S9). As the NCBI RefSeq sequence of SARS-CoV was unclear in the proper annotations for respective proteins, it could not be used in MSA studies. It is observed that most of the epitopes with conserved residues belonged to ORF1ab region (table 3) , and epitopes belonging to this region may act as vaccine candidates targeting MERS and other HCoV species, in addition to SARS-CoV-2. During these CTL epitope identification studies, it was also found that many epitopes identical in sequence as SARS-CoV epitopes found previously in spike, membrane, nucleocapsid and ORF3a proteins [18] , were in the lower ranking positions, in the case of different alleles, and many were not common across alleles, so confidence could not be gathered in enlisting these. However, in the ORF3a case, one epitope harbouring both CD8+ and CD4+ T cell epitopes, PLQASLPFGWLVIGV, among the three most frequently recognized by T cells [19] , was also present among the top-ranked ones in our study (table 1) . Purely for the sake of information to the readers, these T cell epitope data recognized in humans/transgenic mouse in the case of SARS-CoV, that are same/similar to lower ranking T cell epitopes in SARS-CoV-2, are provided as electronic supplementary material, table S1. All of the 10 SARS-CoV-2 proteins, predicted or otherwise, were also studied for helper T cell epitope generation using a well-validated prediction tool, NetMHCIIpan, in addition to an immunogenicity prediction tool, CD4episcore, which predicts epitopes based on both HLA-binding and immunogenicity. Prominent HLA-II alleles studied using NetMHCIIpan were: HLA DRB1 alleles, specifically, Helper T lymphocyte epitopes are typically 15 amino acid residues long. High throughput data for these epitopes was analysed manually to identify common epitopes across alleles and 10 coronaviral proteins. From NetMHCIIpan studies, a total of 1802 promiscuous HTL epitopes (same epitope is predicted to be bound to multiple alleles) selected till rank 2% which are strong binders (or till rank 10%, weak binders in the case that strong binders were not found) were generated. Among these epitopes, 649 epitopes (15-mer) were found to be immunogenic by CD4episcore across all alleles. Another immunogenicity prediction tool, ITcell, was used to predict immunogenic epitopes across only two alleles, DRB1 à 01:01 and DRB1 à 15:01, as it uses PDB files for TCR which are available for these two DRB1 alleles, and there was no TCR structure in PDB for other HLA class-II alleles studied. Also, ITcell predicts 12-mer HTL epitopes. Taking ITcell results into account, top-scoring common immunogenic epitopes to both these immunogenicity prediction tools were 95 in number and were taken for further analysis. These also included some of the epitopes binding to the other HLA-DRB1 alleles studied. This can be explained on the basis of observations that among all HLA-II molecules, there exists a high degree of repertoire overlap, reflecting multiple binding partners. This is most probably due to the backbone interactions rather than anchor residues playing a major role [20] . Among these, top 50 high-scoring immunogenic candidate epitopes are tabulated in table 4. A complete list of these and other epitope candidates are provided in electronic supplementary material, table S2. This list also provides immunogenic HTL epitopes in RBM region (437-508), the ACE-2 binding motif of RBD of surface protein, which has been demonstrated to elicit neutralizing antibodies [16] . The whole dataset of HLA-I and HLA-II epitopes across these mentioned and several other HLA-II alleles is available as supplementary information (electronic supplementary material, tables S4-S6). royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 7: 201141 Table 3 . Immunogenic CTL epitopes across proteins, sorted by high HLA-I binding, high immunogenicity and conservation of residues in multiple sequence alignment (MSA); epitopes in red font are those nonameric CTL epitopes either existing as a part of longer epitopes binding to HLA-II alleles, or these are clustered together as longer sequences, blue highlights depict sequences showing the presence of conserved residues. d e v r e s n o c -i m e s 8 2 F , 7 2 L , 6 2 F n o t e l d e r e b m e m -0 1 n I 2 4 0 4 Bar diagram for CTL and HTL immunogenic epitope distribution across proteins (figure 2) shows a general trend with the number of epitopes not correlated with the size of the proteins. The smallest predicted protein, ORF10, is found to provide more CTL epitopes in the context of this study than the larger spike protein. Some previous studies have also found this to be true, wherein capsid and matrix proteins in the viruses studied were found to 'pack significantly more epitopes than those expected by their size' [22] . Some proteins such as ORF6, ORF8, ORF10, envelope and membrane do not have immunogenic HTL epitopes that harbour nonameric CTL epitopes, binding to either HLA-DRB1 à 0101 and HLA-DRB1 à 1501, and in some cases to none of the two alleles. Also, leader, nsp7, nsp10 and endoRNAse proteins of ORF1ab did not provide common epitopes between the two immunogenicity prediction tools. The highest number of immunogenic HTL epitopes as predicted by CD4episcore was provided by RdRp, followed by nsp3, nsp4, helicase and spike (surface) protein sequences. Venn diagram depicted a common list of many epitopes from a single protein across alleles (electronic supplementary material, figure S10). A distinct pattern is to be noted; analysis of HTL epitopes belonging to HLA-DRB1 à 03:01, HLA-DRB1 à 11:01 and HLA-DRB1 à 15:01 indicated the lowest number of common epitopes or none at all across most of the proteins, and can be considered outlier epitopes. Envelope protein was unique in the sense that it did not provide either strong or weak binders to HLA-DRB1 à 03:01 allele, frequent across North America, Europe, India and Africa. ORF10 was also unique Table 4 . Top 50 immunogenic sequences from CD4episcore and ITcell tools. Red coloured fonts: common to IT cell immunogenicity epitopes sorted by DRB1 à 0101 score. Blue highlights: common to ITcell immunogenicity epitopes sorted by DRB1 à 1501 score. Lime green highlights: immunogenic candidates from CD4episcore and common to ITcell and different from Grifoni et al. Cell Host and Microbe, 2020 paper with patent; also those in blue highlights that are different from Grifoni et al. [21] paper have been mentioned in the text. Peptide start Peptide end Combined Score 6 2 0 . 2 2 8 0 1 4 9 S R T R A F L R F S A I F Y S 8 5 q e s , 7 4 q e s 8 5 e n a r b m e M 4 4 1 4 e s 8 2 8 p s n Ribose methy 6 3 8 1 3 . 9 2 1 9 2 7 7 2 S I V V R N N E R I I L R G K 7 4 q e s , 7 q e s 7 6 5 2 6 4 . e s 7 2 8 p s n Ribose methy 6 9 3 7 . 9 2 3 9 2 9 7 2 D S S I V V R N N E R I I L R 8 q e s 8 6 9 7 9 . 9 2 7 0 5 3 9 4 A L F K N V L K F F T Q V S E 4 9 q e s 4 9 2 p s n 2 5 1 1 0 . 0 3 4 5 5 0 4 5 A R N K A S I A Y K L N M Q T 9 3 1 q e s , 7 0 1 q e s , 6 8 q e s 7 0 1 p R d R 2 7 7 3 1 . 0 3 3 1 1 9 9 N F S W M S R T R A F L R F S s , 8 3 q e s , 4 3 q e s , 1 2 q e s 8 6 e n a r b m e M 2 1 7 5 4 . 0 3 6 9 2 8 G R I R R T A R R Y Y G I Q D 9 4 q e s 9 4 d i s p a c o e l c u N 8 2 7 4 . 0 3 6 5 5 2 4 5 T R A R N K A S I A Y K L N M , 8 0 1 q e s , 7 8 q e s , 4 2 q e s 7 8 p R d R 6 1 1 8 7 . 0 3 5 3 1 1 2 1 i n o f i r G m o r f t n e r e f f i d -G D D Y V T R A T M L I L L V 2 9 q e s , 2 2 q e s 2 2 6 p s n 4 8 8 4 9 . 0 3 1 1 5 7 9 4 Y V W L S F G A S I M M N Y A 6 5 q e s , 2 3 q e s , 5 1 q e s 6 5 e s a e l c u n o x E 4 5 5 9 . 0 3 7 9 3 8 G G R I R R T A R R Y Y G I Q 8 4 q e s 8 4 d i s p a c o e l c u N 4 6 1 2 2 . 1 3 0 1 5 6 9 4 V W L S F G A S I M M N Y A D 3 5 q e s , 0 3 q e s , 1 1 q e s 3 5 e s a e l c u n o x E 6 9 7 5 2 . 1 3 4 3 1 0 2 1 i n o f i r G m o r f t n e r e f f i d -D D Y V T R A T M L I L L V V 1 9 q e s , 4 8 q e s , 0 2 q e s 1 9 6 p s n 4 6 2 5 3 . 1 3 5 8 4 1 7 in providing only weak HTL binders to all of the alleles studied. Venn diagram of all these cytotoxic and helper T cell epitopes taken together showed no common epitopes at all across proteins, but within a given protein set, common epitopes could be found. This observation indicates that every protein of SARS-CoV-2 may present antigenic epitopes to the immune system, resulting in a high number of targets. This further lends credence to the theory that multiple T cell epitopes may elicit an immune response in each case, some eliciting strong and some providing weaker responses and therefore, there may be high degree of T cell immunopathology at the infection site. Stronger T cell immune response may cause even the normal, uninfected cells to be attacked while weaker helper T cell immune response, in some protein targets, may cause weak neutralizing antibody responses as well as weak CTL response at varying times during infection. Very recently, one study has pointed to this immune dysregulation [23] in COVID-19 patients with IL6-mediated low HLA-DR expression with sustained cytokine production. Another correspondence paper also pointed to a cytokine storm in context [24] . The involvement of T cells in the development of cytokine storm cannot be ruled out, where preliminary findings show antigen-specific production of IL-6 and TNF-α response in a dead patient's cell culture supernatants and proposed to be carried out further in a larger cohort [25] . Antibodymediated enhancement of immune response is also not ruled out and can be seen from the fact that all the epitopes present in the list of dominant B cell epitopes (tab. 4 in [21] ) belonging to surface, membrane and nucleocapsid protein, are unique, and there may be a higher non-neutralizing antibody level in COVID-19 patients, like in the case of dengue viruses [26] . royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 7: 201141 While this study was at the writing stage, two studies on T cell epitope generation using all proteins [21, 27] were published. This present study is different from Grifoni et al. [21] study in that two prediction tools with very different algorithms, one using neural network and another using positionspecific weight matrices were employed to generate a list of common epitopes, thereby increasing prediction accuracy. Also, Grifoni et al. [21] focused mostly on previous SARS coronavirus epitope similarity for predicting epitopes, while this paper identified several novel epitopes across all 10 proteins using two different prediction algorithms in each case. Further, this epitope list comprises common top-scoring epitopes with a higher accuracy and is restricted to highly frequent HLA alleles across populations. Also, in view of the several mutations in SARS-CoV-2 genome distinct from SARS-CoV, these epitopes that are not found from similarity to SARS-CoV epitopes, may be potentially more immunogenic. Most of the novel HTL and CTL epitopes in this study, were distinct from the epitopes predicted by Grifoni et al. [21] , and were found among top 100 immunogenic candidates predicted by CD4episcore as well as those in common to ITcell predictions (electronic supplementary material, table S2). There was no supplementary material on the website or sequence information of the epitopes in the study from Nguyen et al. [27] . Further, their work did not take into account TAP transporter binding predictions as well as HLA-II binding studies, while this study took all three stages of MHC processing and presentation pathway: proteasomal cleavage, TAP transporter binding and MHC class I and II binding as well as immunogenicity studies into account for comprehensive predictions. All of the 1924 CTL and HTL topmost epitopes (122 CTL epitopes and 1802 HTL epitopes) across the proteins studied, of which 1096 were non-redundant, unique epitopes, were then clustered using IEDB epitope cluster analysis tool [28] to make further biologically meaningful decisions. Results analysed suggested that many epitopes were clustered around a given consensus sequence (electronic supplementary material, table S3). The total number of clusters (including subclusters) was 244, and 66 epitopes were singletons not present in a cluster. The larger clusters harbouring consensus sequences were: VDFQVTIAEILLIIMRTFKVSIWNLDY-IINLIIKN (23 members), KLWAQCVQLHNDILLAKDTTEAFEKMVSLLSVLLSM and TQHQPYVVD-DPCPIHFYSKWYIRVGARKSAPLIEL (20 members each). These clusters across proteins and alleles may be considered immunodominant epitopes and tested first among the ranked list of epitopes. Among immunogenic 122 CTL epitopes from IEDB and 666 HTL epitopes from CD4episcore, again HLVDFQVTIAEILLIIMRTFKVSIWNLDYIINLII topped the list. Further, among the same immunogenic 122 CTL and 95 HTL epitopes common to two prediction algorithms, CD4episcore and ITcell, VTIAEILLIIMRTFKVSIWNLDYIINL belonging to ORF6 again topped. Moreover, PIHFYSKWYIRV-GARKSAPLIEL belonging to ORF8 and MGYINVFAFPFTIYSLL belonging to ORF10 were also among the top three clustered sequences. It is of interest to note that sequences in the consensus sequence MGYINVFAFPFTIYSLL belonging to ORF10 are weak binders to all the HLA-DRB1 alleles studied, while the nonameric sequences in this consensus sequence are strong binders to all HLA-I supertypes studied. Cross-reactivity analyses against human proteome based on UniProt data (figure 3) indicated that all the immunogenic CTL and HTL epitopes (all HTL epitopes taken from CD4episcore list, removing redundant HTL epitopes; total 719 CTL + HTL epitopes) obtained were not present in human proteome and hence, no cross-reactivity to normal human cells may occur. The widespread presence of novel, unique T cell epitopes in the SARS-CoV-2 proteome, is also the main reason that in this paper, B cell epitopes were not studied. Further, including B cell epitopes in the vaccination strategy with T cell epitopes may not be a good strategy, and may even be counterproductive. Even though neutralizing antibody levels are found to be low in COVID-19 patients [29, 30] , it is expected that CD4 + T cell expansion responses may increase the neutralizing antibody levels [31] and hence, quantifying CD4 + T cell responses using IFN-gamma ELISPOT assays will be useful. This is done in order to minimize the possible immune system backfiring [23, 24] due to the presence of too many overlapping as well as non-overlapping epitopes in multi-subunit vaccines. It is suggested that royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 7: 201141 helper T cell epitopes be chosen so as to elicit an immune response robust enough to prime and maintain neutralizing antibody responses, as well as keep the immunopathology under check. In the proven scenario of immune system backfiring, it may be one possible mechanism by which SARS-CoV-2 may be acting at its deadliest nature. It is indeed, a dangerous pathogen to control, although for effective immunotherapy at a global scale, efforts should already be underway using this ranked list of epitopes. Almost all of its proteins may pose as foreign agents to the human immune system, with each protein contributing several unique, different immunogenic epitopes. This horde of foreign proteins brings down an avalanche of immune system molecules to the infection site, in order to fight the virus. But instead of immune protection, this may lead to immune enhancement or allergic inflammation at the infection site. These analyses demonstrate that coronavirus genome has evolved to be a unique genome. Even as this study is important in pointing out the possible mechanisms such as immunopathologies arising due to T cell hyperactivation, contributing to the contagious nature of SARS-CoV-2, more evidence is required in the form of in vitro and in vivo experiments. While many of the proteins studied are found to be expressed and also their functions known by virtue of homology with SARS-CoV, many of the novel ORFs including ORF8 and ORF10 need to be experimentally tested for their functional validation. Experimental MHC-peptide binding and T cell stimulation assays are now required for in vitro testing for further refinement and development as potent immunogens to be incorporated as components of multi-subunit vaccines. Utilizing all 10 of the SARS-CoV-2 proteins, predicted or otherwise, a ranked list of CTL and HTL epitopes with high HLA-binding affinity, high TAP transport efficiency and high C-terminal proteasomal cleavage ranking has been generated. Incorporating the alleles predominant in the whole world population, two different prediction algorithms were implemented in the identification of common epitopes for creating consensus. Immunogenicity scores for these epitopes have also been predicted in order to further narrow down the list to a few key epitopes that can be experimentally tested. Peptide matching with the human proteome showed no indication of possible cross-reactivity. These epitopes are provided to the scientific community for further development using in vitro and in vivo assays and saving their time and costs involved in our urgent bid to tackle SARS-CoV-2 infections and ensuing death. This essential list of highly probable epitopes opens up avenues for developing prophylactic and therapeutic interventions and for further understanding of the human immune system responses to this virus. royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 7: 201141 binding and immunogenicity prediction and outputs a list of immunogenic peptides using a combined score. The authors combined immunogenicity and HLA-binding scores, using the median percentile rank score (HLA_score) of the 7-allele method (ranging from 0 to 100) and combined it with their neural network-based immunogenicity score. This combined score is calculated as follows: Combined score: (alpha à Imm score) + ((1 − alpha) à HLA_score), where alpha is optimized to 0.4. The 7 alleles used are: 'HLA-DRB1:03:01', 'HLA-DRB1:07:01', 'HLA-DRB1:15:01', 'HLA-DRB3:01:01', 'HLA-DRB3:02:02', 'HLA-DRB4:01:01' and 'HLA-DRB5:01:01'. The whole HTL epitope sequence list belonging to each protein was given as an input, and IEDB-recommended combined method was selected for scoring. Lower combined scores imply higher immunogenicity according to the authors developing this prediction tool. The immunogenic versus non-immunogenic epitopes cut-off was a combined score of 50 as per CD4episcore paper. ITcell works on the basis of three stages of MHC-II processing and presentation pathway. These three stages are, in the authors' [37] own words: '….antigen cleavage, MHCII presentation and TCR recognition. First, antigen cleavage sites are predicted based on the cleavage profiles of cathepsins S, B and H. Second, for each 12-mer peptide in the antigen sequence we predict whether it will bind to a given MHCII, based on the scores of modelled peptide-MHCII complexes. Third, we predict whether or not any of the top-scoring peptide-MHCII complexes can bind to a given TCR, based on the scores of modelled ternary peptide-MHCII-TCR complexes and the distribution of predicted cleavage sites.' The scores are given as normalized Z-scores with negative scores implying higher immunogenicity. The epitope sequences as well as PDB files for TCR molecules corresponding to their cognate MHC alleles were given as an input. The PDB ID for files for HLA-DRB1 à 01:01 and HLA-DRB1 à 15:01 alleles are 1FYT.pdb and 1YMM.pdb, respectively. PDB files for all other alleles were not available. As globally conserved epitopes are relevant at this time to contain and treat coronavirus infection, the clustering approach was used to find patterns among disparate datasets. In order to group epitopes into several clusters, IEDB epitope cluster analysis tool [28] was applied. All the topmost CTL and HTL epitopes across proteins targets were used as inputs with minimum sequence identity threshold as 70%. Cluster-break algorithm was applied to generate a clear representative sequence. All the immunogenic CTL and HTL epitopes obtained were used to search against human proteome data from UniProt database (2020_02 release, 181 292 975 sequences as of date 6 May 2020) for any matches to human proteome, thus avoiding cross-reactivity. For this, Multiple Peptide Match tool (https://research. bioinformatics.udel.edu/peptidematch/batchpeptidematch.jsp) of Protein Information Resource was used. ORF8 protein (GenBank: QID21074.1; no RefSeq sequence is identified for ORF8), ORF7a protein (YP_009724395.1), ORF6 protein (YP_009724394.1), membrane glycoprotein (YP_009724393.1), envelope protein (YP_009724392.1), ORF3a protein (YP_009724391.1), surface glycoprotein (YP_009724390.1) and ORF1ab (YP_009724389.1) were analysed in order to cover the entire genome of SARS-CoV-2 in view of absence of data on its virulent proteins Cytotoxic T lymphocyte epitope prediction Nonameric peptide epitopes were selected. Epitopes from NetCTLpan were ranked according to the combined score using all three different methods representing antigen processing and presentation steps, and epitopes from PickPocket algorithm were sorted by affinity (IC 50 values in nM) For ORF1ab proteins, because common epitopes could not be found from top scorers in NetCTLpan and PickPocket methods, top 30 candidates were used to select promiscuous epitopes 34]) was used to predict helper T cell epitopes across several HLA-DRB1 alleles A consensus list of 15 amino acids-long ranked epitopes was generated. For generating top-ranked epitopes, these were sorted using descending order of per cent rank. Per cent rank is a normalized prediction score, comparing to the prediction of a set of random peptides [32]. The epitopes with per cent rank less than 2% and less than 10% were considered strong and weak binders 35]) was used to generate a list of immunogenic CTL epitopes. Immunogenicity of a peptide-MHC complex is predicted based on the physico-chemical properties of amino acids and their positions in the predicted peptide. Specifically, amino acids with large and aromatic side chains and positions 4-6 are more important to the immunogenicity of the peptide being presented. The ranking was done after sorting from higher to lower immunogenicity score CD4episcore was developed using neural networks and combines HLA royalsocietypublishing.org/journal In view of different/unclear annotations, it was difficult to get corresponding protein sequences from SARS-CoV (RefSeq accession ID NC_004718.3). There are no human CoVs in gamma/delta CoV categories. In addition, bat coronavirus RaTG13 sequences (MN996532.1) were also used. Notes This manuscript has been released as two pre-prints at ChemRxiv, (Mishra, 2020) with one part of the manuscript published with Repurposing therapeutics for COVID-19: supercomputerbased docking to the SARS-CoV-2 viral spike protein and viral spike protein-human ACE2 interface Immunoinformatics and modeling perspective of T cell epitope-based cancer immunotherapy: a holistic picture Prediction and molecular modeling of T cell epitopes derived from placental alkaline phosphatase for use in cancer immunotherapy Draft landscape of COVID-19 candidate vaccines T-cell epitopes in severe acute respiratory syndrome (SARS) coronavirus spike protein elicit a specific T-cell immune response in patients who recover from SARS Epitope-based vaccine target screening against highly pathogenic MERS-CoV: an in silico approach applied to emerging infectious diseases 2020 The architecture of SARS-CoV-2 transcriptome Coronaviruses: an overview of their replication and pathogenesis Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation SARS coronavirus accessory proteins A SARS-CoV-2 protein interaction map reveals targets for drug repurposing Engineering T cells specific for a dominant severe acute respiratory syndrome coronavirus CD8T cell epitope Recent advances in subunit vaccine carriers 2020 T cell epitope-based vaccine design for pandemic novel coronavirus 2019-nCoV Tracking changes in SARS-CoV-2 spike: evidence that D614G increases infectivity of the COVID-19 virus The SARS-CoV-2 receptorbinding domain elicits a potent neutralizing response without antibody-dependent enhancement A human monoclonal antibody blocking SARS-CoV-2 infection Understanding the T cell immune response in SARS coronavirus infection T cell responses to whole SARS coronavirus in humans Functional classification of class II human leukocyte antigen (HLA) molecules reveals seven different supertypes and a surprising degree of repertoire sharing across supertypes 2020 A sequence homology and bioinformatic approach can predict candidate targets for immune responses to SARS-CoV-2 2012 CD8T cell epitope distribution in viruses reveals patterns of protein biosynthesis Complex immune dysregulation in COVID-19 patients with severe respiratory failure COVID-19: consider cytokine storm syndromes and immunosuppression Phenotype and kinetics of SARS-CoV-2-specific T cells in COVID-19 patients with acute respiratory distress syndrome Cross-reacting antibodies enhance dengue virus infection in humans Human leukocyte antigen susceptibility map for SARS-CoV-2 Development of a novel clustering tool for linear peptide sequences Neutralizing antibody responses to SARS-CoV-2 in a COVID-19 recovered patient cohort and their implications Serological and molecular findings during SARS-CoV-2 infection: the first case study in Finland CD4+ T-cell expansion predicts neutralizing antibody responses to monovalent, inactivated 2009 pandemic influenza A(H1N1) virus subtype H1N1 vaccine NetCTLpan: pan-specific MHC class I pathway epitope predictions specificities for receptors based on receptor pocket similarities: application to MHC-peptide binding NetMHCIIpan-3.0, a common pan-specific MHC class II prediction method including all three human MHC class II isotypes Properties of MHC class I presented peptides that enhance immunogenicity Predicting HLA CD4 immunogenicity in human populations Predicting CD4 T-cell epitopes based on antigen cleavage, MHCII presentation, and TCR recognition Designing of cytotoxic and helper T cell epitope map provides insights into the highly contagious nature of the pandemic novel coronavirus SARS-CoV-2 Acknowledgements. This author acknowledges the tireless help of researchers working towards understanding SARS-CoV-