key: cord-0428629-wn97sofz authors: Bilderbeek, Richèl J.C.; Baranov, Maxim; van den Bogaart, Geert; Bianchi, Frans title: Transmembrane helices are an overlooked and evolutionarily conserved source of major histocompatibility complex class I and II epitopes date: 2021-05-07 journal: bioRxiv DOI: 10.1101/2021.05.02.441235 sha: 433f374be01aaf7e068cdb57203ff23ec03cfb7c doc_id: 428629 cord_uid: wn97sofz Cytolytic T cell responses are predicted to be biased towards membrane proteins. The peptide-binding grooves of most haplotypes of histocompatibility complex class I (MHC-I) are relatively hydrophobic, therefore peptide fragments derived from human transmembrane helices (TMHs) are predicted to be presented more often as would be expected based on their abundance in the proteome. However, the physiological reason of why membrane proteins might be over-presented is unclear. In this study, we show that the over-presentation of TMH-derived peptides is general, as it is predicted for bacteria and viruses and for both MHCI and MHC-II. Moreover, we show that TMHs are evolutionarily more conserved, because single nucleotide polymorphisms (SNPs) are present relatively less frequently in TMH-coding chromosomal regions compared to regions coding for extracellular and cytoplasmic protein regions. Thus, our findings suggest that both cytolytic and helper T cells respond more to membrane proteins, because these are evolutionary more conserved. We speculate that TMHs therefore are less prone to escape mutations that enable pathogens to evade T cell responses. IC50 value in the lowest 2% of all peptides within a proteome (see supplementary 133 Tables 4 and 5 for values), whereas the previous study defined a binder as having 134 an IC50 in the lowest 2% of the peptides within a protein. This revised definition 135 precludes bias of proteins that give rise to no or only very few MHC epitopes. 136 To verify that the results are similar, a side by side comparison was performed 137 shown in the supplementary materials. reanalyzed. For each of the detected epitopes, its possible location(s) in a hu-142 man reference proteome, with UniProt ID UP000005640 9606, was mapped. 143 For the epitopes that were present in the proteome exactly once, the topology 144 of the proteins in which these epitopes were located was predicted using both 145 TMHMM [8] and PureseqTM [13] . From this topology, we determined if the epitope 146 overlapped with a TMH. 147 The full analysis can be found at https://github.com/richelbilderbeek/ 148 bbbq_article_issue_157. gov/snp/docs/RefSNP_about/), and the databases gene (for gene names, [22] ) 158 and protein (for proteins sequences, [23] ). 159 The first query was a call to the gene database for the term 'membrane 160 protein' (in all fields) for the organism Homo sapiens. This resulted in 1,077 161 gene IDs (on December 2020). The next query was a call to the gene database 162 to obtain the gene names from the gene IDs. Per gene name, the dbSNP NCBI 163 database was queried for variations associated with the gene name. As the 164 NCBI API constrains its users to three calls per second (to assure fair use), we 165 had to limit the extent of our analysis. Per SNP, the protein NCBI database was queried for the protein sequence. 176 For each protein sequence, the protein topology was determined using PureseqTM. 177 Using these predicted protein topologies, the SNPs were scored to be located 178 within or outside TMHs. Figure 1A shows the predicted presentation of TMH-derived peptides in MHC-183 I, for a human, viral and bacterial proteome. Per MHC-I haplotype, it shows 184 the percentage of binders that overlap with a TMH with at least one residue. 185 The horizontal line shows the expected percentage of TMH-derived epitopes 186 that would be presented, if TMH-derived epitopes would be presented just as 187 likely as epitopes derived from soluble regions. For 11 out of 13 MHC-I hap-188 lotypes, TMH-derived epitopes are predicted to be presented more often than 189 the null expectation, for a human and bacterial proteome. For the viral pro-190 teome, 12 out of 13 haplotypes present TMH-derived epitopes more often than 191 expected by chance. The extent of the over-presentation between the different 192 haplotypes is similar for the probed proteomes, which strengthens our previous 193 conclusion [7] that the hydrophobicity of the MHC-binding groove is the main 194 factor responsible for the predicted over-presentation of TMH-derived peptides. 195 3.2 TMH-derived peptides are predicted to be over-presented 196 in MHC-II 197 We next wondered if the over-representation of TMH-derived peptides would 198 also be present for MHC-II. Figure 1A shows the percentages of MHC-II epitopes 199 predicted to be overlapping with TMHs for our human, viral and bacterial with the phrase 'membrane protein', which are genes coding for both membrane- We split this analysis for TMPs containing only a single TMH (so-called 267 single-membrane spanners) and TMPs containing multiple TMHs (multi-membrane 268 spanners). We hypothesized that single-membrane spanners are less conserved 269 than multi-membrane spanners, because multi-membrane spanners might have 270 protein-protein interactions between their TMHs, for example to accommodate 271 active sites, and thus might have additional structural constraints. From the split data, we did the same analysis as for the total TMPs. Figure 4C shows the shows that less SNPs are found in TMHs, than expected by chance. 276 We also determined the probability to find the observed amount of SNPs TMHs are evolutionary more conserved than solvent-exposed protein regions. ies will thus recognize solvent-exposed regions of antigens that are accessible for binding to the B cell receptor. However, the results from our study predict 378 that most MHC-II haplotypes present relatively hydrophobic peptides, which 379 are less likely to be solvent-exposed. It is unknown why B and T cells seem In general, one might expect that evolutionary selection results in an immune 386 system that as most attentive for protein regions that are essential for the sur-387 vival, proliferation and/or virulence or pathogenic microbes, as these will be 388 most conserved. In SARS-CoV-2, for example, there is preliminary evidence 389 that the strongest selection pressure is upon residues that change its viru- of this scarcity and variance in targets, one can imagine that it will be mostly 394 unfeasible to provide innate immune responses against such rare essential pro-395 tein regions, as suggested in a study on influenza [40] , where it was found that 396 the selection pressure exerted by the immune system was either weak or absent. We believe that our revised definition is more correct, as it overcomes bias from 666 proteins with very low numbers of peptides and/or MHC-predicted binders. Our previous study used the TMHMM web server to predict TMHs. The nuggetsr' has been submitted to and is accepted by CRAN. To reproduce the full experiment presented in this paper, the functions 731 needed are bundled in the 'bbbq' R package. This package is too specific to 732 be submitted to CRAN. Table 3 : Percentage of epitopes derived from a TMH found in the two elution studies, for the two different kind of topology prediction tools. The values between braces show the the number of epitopes that were predicted to overlapping with a TMH per all epitopes that could be uniquely mapped to the representative human reference proteome. To compare the over-presentation of TMH-derived epitopes between the differ-789 ent proteomes, we normalized this percentages in such a way that 1.0 is the 790 percentage of TMH-derived epitopes that would be expected by chance. Abbreviations of the haplotype names HLA