key: cord-0335912-z60hi869 authors: Rubio Garcia, A.; Paterou, A.; Powell Doherty, R. D.; Landry, L. G.; Lee, M.; Anderson, A. M.; Slawinski, H.; Ferreira, R. C.; Trzupek, D.; Szypowska, A.; Wicker, L. S.; Teyton, L.; Ternette, N.; Nakayama, M.; Todd, J. A.; Pekalski, M. L. title: HLA-DQβ57, anti-insulin T cells and insulin mimicry in autoimmune diabetes date: 2022-05-14 journal: nan DOI: 10.1101/2022.05.11.22274678 sha: f5bea508653602006090a6e9240820cd97bd0af1 doc_id: 335912 cord_uid: z60hi869 Type 1 diabetes (T1D) is caused by a T-cell-mediated destruction of insulin-secreting pancreatic islet {beta} cells. The T1D-predisposing human leukocyte antigen (HLA) class II molecule, DQ8, binds and presents insulin B chain peptides in the thymus producing autoreactive CD4+ T cells. Here, we show that this process is driven by negatively-charged T cell receptor (TCR) complementarity-determining region 3{beta} (CDR3{beta}) sequences interacting with alanine at position 57 of the DQ8 {beta} chain. Since T1D aetiology is linked to gut microbiota dysbiosis, we hypothesized that the commensal proteome contains mimics of the primary insulin B:9-23 epitope that control TCR selection and tolerance. We identified a large set of bacterial proteins with significant similarity to insulin B:9-25, particularly from the transketolase (TKT) superfamily. We isolated a CD4+ TCR with a negatively-charged CDR3{beta} from the pancreas of a DQ8-positive patient that was cross-reactive with one of these TKT peptides and insulin B:9-23. The T1D-protective molecule, DQ6, with the negatively-charged aspartic acid (D) at DQ{beta}57, showed strong TKT mimotope binding, supporting a role for TKT-specific regulatory T cells in resistance to T1D. We propose that in a DQ8+DQ6- child with a proinflammatory dysbiotic gut microbiota, cross-reactive TKT-insulin B chain peptide T effector cells escape from the thymus and initiate T1D. TKT is a strong candidate because it is highly upregulated during weaning, a key period in T1D aetiology, and hence a prominent target for an autoimmune-prone immune system. Inhibiting gut dysbiosis and improving immune tolerance to TKT and other mimotopes, especially before and during weaning, could be a route to primary prevention of T1D and other common diseases. In type 1 diabetes (T1D) the first autoantibodies to appear are most frequently against insulin, with a peak incidence between 9 and 18 months of age, particularly in children carrying the predisposing DR4-DQ8 haplotype 20 . In contrast, the DR15-DQ6 haplotype provides strong dominant protection from T1D 6, 10, 11 . Many sequence differences exist between these two haplotypes, with the strongest single effect at position 57 of the b chain of the DQ molecule, with a non-charged amino acid (alanine/A, valine/V, or in the nonobese diabetic [NOD] T1D mouse model, serine/S) associated with increased T1D susceptibility and the negativelycharged aspartic acid (D) conferring T1D protection. DQ8 has a primary role in disease initiation through presentation of peptide epitopes to the autoreactive CD4 + T-cell receptors (TCR) from the N-terminal region of the insulin B chain, including residues 9-23 and 9-25 (B:9-23; B:9-25; Methods) [1] [2] [3] [4] [5] [6] [7] . It is already established that TCR repertoire differences from the thymus, mediated in part through variable a and b complementarity-determining region (CDR3) 3 regions and their interactions with peptide epitopes and amino acids in the peptidebinding cleft of HLA class II molecules, are determined by HLA class II genotype [21] [22] [23] [24] [25] [26] [27] . However, the effect of DQb57 on CDR3 antigen-recognition sequences in humans and in T1D is unknown. Alterations of the gut microbiota preceding the appearance of anti-islet autoantibodies have been hypothesised as a primary causal factor in HLA class II susceptible children developing T1D [13] [14] [15] [16] [17] [18] 28, 29 . This is consistent with associations between HLA class II genotypes and the gut microbiota 14, 30 , and is supported by evidence from the NOD mouse model of T1D 31 . It is possible that immune tolerance to bacterial mimotopes, defined as sequences in bacterial proteins that are cross-reactive with host proteins, is compromised in an environment of microbiota dysbiosis and reduced gut epithelial integrity and functions. Here, we provide evidence that such a mechanism exists coupled to HLA-DQ-regulated TCR CDR3b antigen recognition. These findings suggest therapeutic opportunities for prevention of the earliest pathogenic events in T1D aetiology long before insulin deficiency and diagnosis. We purified CD4 + T cells (n=349,623) from PBMCs from selected donors (n=48, median age=11 years) carrying the lowest (protected), low risk and highest risk HLA-DR-DQ risk diplotypes (susceptible; Methods; Supplementary Tables S1-S4). Cells were stimulated and loaded into a single-cell platform for preparing paired 5' gene expression and TCR libraries (Methods). After sequencing, gene expression data analysis and TCR repertoire assembly, we obtained 288,903 cells with both a valid gene expression profile and at least one productive TCR chain (Methods). CD4 + T conventional cells (Tconv; n=258,387; TCR clonotypes, 3 255,952), defined as those not belonging to either the CD4 + recent thymic emigrant (RTE; 18,194; 18,127 clonotypes) or CD4 + Treg (12,322; 12,260 clonotypes) 32 clusters (Extended Data Fig. 1 ), we examined which specific differences of CDR3b frequencies could be explained by HLA-DR-DQ risk (Methods). For single amino acids ( Fig. 1(a) ) negativelycharged residues (D and glutamic acid/E) were more likely to be present in CDR3β chains from susceptible donors carrying the hydrophobic amino acid, A, at DQb57. Conversely, in protected donors, with a D at b57 of the DQβ chain, there were much fewer D/E CDR3bs, which were replaced by amino acids, V, leucine/L and isoleucine/I that have high interaction potentials 33,34 with the negatively-charged bD57. The 2-mers, AD, GD, EE and EY, were increased in CDR3bs from susceptible donors ( Fig. 1(b) ), and similarly for 3-mers such as DTE and EET ( Fig. 1(c) ; 4-mers shown in Extended Data Fig. 2 ). Moreover, a multilevel regression model predicted an increase of D as the T1D OR became higher ( Fig. 1(d) ), and the opposite effect in case of V ( Fig. 1(e) ). We obtained some evidence for the same effects for CD4 + RTEs and Tregs, although the support for this was limited owing to the much lower number of cells available for analysis compared to Tconvs (Extended Data Fig. 3 For the TCRα chain sequences, we also estimated systematic differences at low local false sign rates (LFSR) between DR-DQ protective and susceptible diplotypes (Extended Data Fig. 4 -6). These corresponded to biases in the usage of particular Vα and Jα genes, which are carried onto the CDR3α region due to the lower recombination diversity of the TCRa. In particular, top estimated differences in k-mers match fragments of SGTYK and GGSYI, which is a sequence encoded in the TRAJ40 and TRAJ6 genes whose usage increased in individuals carrying susceptible DQ versus protected alleles (Extended Data Fig. 7) . A previous study analysed TCR Va gene usage by HLA genotypes by bulk sequencing of RNA from blood and reported an association between DQb57 and certain Va genes 35 . We observed the same Va genes differed in frequency between susceptible and protective DQ, consistent with the reported DQb57 effects (Extended Data Fig. 8 ). We then investigated whether these observed repertoire differences correlated with an immune response against the primary epitope in T1D, insulin B:9-23 1, 2, 7, 36 . We purified activated circulating CD4 + T cells (n=19,969), defined as HLA-DR + CD38 + , from five children newly diagnosed with T1D carrying susceptible DR-DQ diplotypes (Supplementary Tables S5 and S6 ). This activated cell subpopulation is enriched in autoreactive CD4 + T cells in patients with coeliac disease and other autoimmune disorders 37 . We also sourced CD4 + TCR clonotype sequences (n=1,428) isolated from the islets of five T1D patients from the Network of 4 Pancreatic Organ Donors (nPOD) 38 who carried susceptible DQ (Supplementary Tables S7 and S8) . We compared these TCR sequences against those from individuals with highest susceptibility, DQ2/8 (n=23) from our original cohort (Supplementary Table S1 ). Both CDR3β sequences from islets and circulating activated cells had an increased frequency of the D residue when compared to non-activated circulating CD4 + T cells from susceptible donors ( Fig. 2(a) ). Note that owing to the low number of cells available from islet-infiltrating cells, we cannot rule out a negative fold (90% credible interval (0.974, 1.120)). In case of circulating activated cells, where the number of cells sequenced is larger, the posterior estimate only includes fold changes above 1 (90% credible interval (1.114, 1.183); Fig. 2(a) ). We also analysed TCR clonotype sequences (n=159) from nPOD donors (n=5; Supplementary Tables S7 and S8) for whom reactivity against preproinsulin (PPI) peptides had been tested 2 . A binomial regression was consistent with insulin-reactive TCRs being increased in repertoires proportionally to the presence of negatively-charged CDR3bs (posterior probability of a positive effect, p=0.81). Without considering noise in D frequencies, the posterior distribution predicted strong positive effects ( Fig. 2(b) ; p=0.97). In order to assess the existence of insulin mimotopes in the gut microbiome, we assumed bacterial proteins of interest would exhibit a significant degree of similarity to the primary epitope B:9-25 (Methods). We measured similarity between B:9-25 and any given protein as the maximum pairwise local alignment score between both sequences. We chose B:9-25 instead of B:9-23 because peptides have to have sufficient length for productive searches and the importance of the two C-terminal phenylalanine/F residues is established for T cell recognition 9 . Taking block maxima of gapless pairwise alignment scores yields an extreme value distribution. This distribution emphasizes tail events and reflects the expected biology of mimicry, where only an extreme degree of similarity should be relevant. We estimated a null distribution of scores by drawing random permutations from B:9-25 and calculating pairwise local alignment scores against every reviewed entry in the human proteome. We then calculated a parametric approximation to this empirical distribution (Methods). Finally, we aligned B:9-25 to every protein in a gut metagenome reference catalogue which comprised millions of common bacterial and archaeal proteins (MGnify 39,40 ). This allowed us to derive precise p-values for exceedingly rare events, and also false discovery rates (FDRs) to account for the large set of multiple comparisons performed. We found 134 microbial proteins to be significantly similar to B:9-25 at FDR < 0.2 (Fig. 3) . The largest association signal was from proteins with a TKT domain ( Fig. 3 ; Extended Data Table . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 14, 2022. ; https://doi.org/10.1101/2022.05.11.22274678 doi: medRxiv preprint 5 1), although there are significant associations located in other protein families. Both the number and the variety of significant associations suggest tolerance to insulin is regulated by a large niche of gut commensals. An important part of our proof for functional cross-reactive bacterial mimotopes is the demonstration that certain CD4 + T cells have TCRs that recognise both B:9-25 peptides in the canonical B:9-23 register and one or more of our top commensal peptide epitopes. We therefore screened 179 cloned CD4 + TCRs detected in islets from six T1D organ donors carrying the DR4-DQ8 haplotype with ten TKT mimotope peptides (Supplementary Table S9) and insulin B:9-23 (Methods). We identified three islet CD4 + TCRs, including two previously reported TCRs 1,41 , that were reactive to insulin B:9-23 ( Fig. 4(a) Clostridium leptum ( Fig. 4(b) ; Extended Data Table 1 ). As expected, the GSE.166H9 TCR recognised the TKT peptide 8 and B:9-23 presented by HLA-DQ molecules (Fig. 4(c) ). Further analysis determined that the TCR responds to both peptides presented by DQ8-trans, consisting of DQ2α and DQ8β, most strongly, while the TCR can also recognise the peptides presented by DQ8 (Fig. 4(d) ). Intensities of responses to the insulin B:9-23 and TKT peptide 8 presented by DQ8-trans were similar ( Fig. 4(e) ), whereas GSE.166H9 T cells responded to insulin B:9-23 more strongly than TKT peptide 8 when presented by DQ8 (Fig. 4(f) ). The most potent response was induced by both insulin B:9-23 and TKT peptide 8 presented by DQ2a-DQ8b-trans, indicating that immune responses by the 166.H9 TCR were initiated by these peptide-MHC complexes. If TKT sequences are functionally mimicking insulin B-chain peptides in vivo then they should have similar DQ binding properties as insulin B-chain peptides, which bind strongly to DQ6 and weakly to DQ8 5, 19 . To investigate this we pulsed DQ6-and DQ8-positive B cell lines with whole, recombinant Blautia caecimuris TKT, immunoprecipitated the DQ and DR molecules, eluted the bound peptides and sequenced them by mass spectrometry (Methods). Two TKT peptides bound to DQ6 corresponded to the insulin mimotope sequences, residues, 61-73, FVMSKGHSVEALY, and 66-78, GHSVEALYAVLAE (Extended Data Fig. 9 ; Supplementary Table S10). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) 6 In this study we report associations of HLA-DQ with the amino acid charge and interaction potential of CDR3b TCR sequences in anti-insulin autoreactivity and molecular mimicry between insulin and the commensal enzyme, TKT. Protective DQ molecules, with a D residue at DQb57 (DQ6 and DQ7/DQB*0301) are associated with reduced frequencies of CDR3b D/E residues and, conversely, with increased frequencies of D/E when the A residue is present at DQb57 (DQ8 and DQ2). These results are consistent with previous data from coeliac disease 42 and from the T1D model, the NOD mouse 12,27,43,44 , in which mutation of I-Ag7 b chain (orthologue of DQb) at position 57 from S to D resulted in T1D protection and reduced D and E amino acids in the CDR3 sequences of B:12-20-specific CD4 + T cells 27 . Furthermore, known insulin-reactive clones isolated from human islets contain negative residues within their CDR3b 2 . The same directions of CDR3b effects were observed in Tregs and RTEs (Extended Data Fig. 3 ), suggesting that early DQ-mediated positive and negative selection events in the thymus dictate the frequencies of anti-insulin Teffs and Tregs that leave the thymus. We also conclude that functional TCR repertoire differences associated with T1D DQ diplotypes can influence recognition of mimics of the insulin B:9-23 epitope expressed in the microbiome, most notably, but not only (Fig. 3) , from the metabolic enzyme superfamily TKT 45 . TKT enzymes are among the most upregulated bacterial genes during infant weaning to help metabolise the incoming fibre with the introduction of solid food, prior to the appearance of insulin autoantibodies 45,46 . TKT amino acids occupying highly conserved residues and their spacing seem very well suited for evolution to develop insulin mimotopes employing TKT as a template. This is facilitated by the lack of conservation in the remaining residues, which can be mutated without disrupting enzyme function (Extended Data Fig. 10 ). Given that insulin mimotopes encoded in TKT are present across two bacterial phylums, Firmicutes and Actinobacteria, horizontal or lateral gene transfer events are probable. Phylogenetic tree reconstruction supports the existence of these events 47 , as well as some observations derived from insect models 48 . Some of these microbial commensals, which harbor sequences more similar to B:9-25 than what would be expected by chance, have been previously associated with T1D in a variety of longitudinal and cross-sectional studies. Clostridium leptum, encoded a TKT mimotope peptide that stimulated a DQ8-restricted TCR from an islet-resident T cell (Fig. 4) , was diminished in NOD mice that progressed to T1D 49 . A recent faecal transplant trial in adults with active T1D halted the disease in some patients 50 . Notably, C. leptum abundance changes were among the best predictors of response to the transplant 50 . C. leptum was also the most positively associated bacteria with less healthy plant-based food diets, maltose, sucrose, 7 starch and other carbohydrate intake in a large observational study with deeply phenotyped individuals 51 . Blautia caecimuris expresses a TKT mimotope peptide that we demonstrated to be bound by the protective DQ6 molecule. The Blautia genus is a well-known group of anaerobic bacteria that produce short-chain fatty acids (SCFA). The relative overabundance of this genus was measured at the prediabetic and progressive stages of T1D in the longitudinal DIABIMMUNE cohort 52 . The same association was also found in other cross-sectional studies 53 . Acetatifactor sp. containing the most significant match to insulin B:9-25, also within the TKT mimotope domain (Extended Data Table 1 ), was found to be higher in NOD mice compared to nonautoimmune prone ICR mice 54 . Anaerobutyricum (previously Eubacterium) hallii, is an example of a species that contains a potential non-TKT mimic. We identified a perfect 9-mer match to one of the preferred insulin B:9-23 DQ registers (EALYLVCGE) within an open-reading frame that encoded a protein with a CH3/CHASE3 domain 55 . A. hallii was found to be significantly less abundant in children at T1D onset by a recent case-control study 56 . We propose that in a child with microbiota dysbiosis leak of microbial epitopes from the gut and inflammatory context of DQ8, TKT presentation at weaning could initiate anti-insulin autoimmunity via TCR cross reactivity and negatively-charged CDR3bs. We predict that carriage of DQ8 or DQ6 will alter the composition of the microbiota and the abundance of microbial genes such carrying cross-reactive mimotopes. The T1D susceptibility allele at the insulin gene decreases insulin expression in the thymus 57,58 and consequently increasing numbers of circulating anti-insulin CD4 + T cells, which may further control content of the microbiome 59 in a mimotope-dependent manner. Given cross-reactivity with TKT mimotopes, we might expect in DQ8-positive children that the insulin gene polymorphism might also affect microbiota composition, in addition to insulin's susceptibility allele allowing more B:9-23-TKT cross-reactive CD4 + T cells to escape from the thymus 60 . Gut bacteria can be transported into the thymus by a subset of dendritic cells thereby altering the T cell repertoire 61 . In a child with high HLA class II risk of T1D delivery of bacterial TKT into the thymus and by DQ8 could lead to increased numbers of insulin cross-reactive pathogenic T cells escaping from the thymus mediating the increased T1D risk. In contrast, a child carrying DQ6 will delete those crossreactive cells in the thymus alongside increased production of insulin-specific Tregs, explaining the strong dominant protection against T1D associated with this allotype and its D residue at DQb57. This mechanism is supported by our demonstration that TKT mimotope peptides from B. caecimuris bind to DQ6 (Extended Data Fig. 9 ; Supplementary Table S10). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) 8 Associations between dysbiosis (altered gut microbiota derived context signals), microbial taxa composition and T1D in industrialised countries are widely reported [13] [14] [15] [16] [17] [18] 28, 29 , consistent with the increasing incidence of T1D over the past decades and the proposed extinction of certain metabolically-beneficial gut bacteria 17, 18 , particularly the efficient metabolisers of human milk oligosaccharides (HMO) such as Bifidobacterium longum subsp. infantis (B. infantis). Recently, it has been shown that proinflammatory cytokines are present in the gut of in exclusively breast-fed babies and that this can be corrected by supplementation with a probiotic strain of B. infantis 62 . Efficient HMO metabolism leads to the production of microbial metabolites such as indolelactate and SCFAs 63 that promote anti-inflammatory T-cell function and gut epithelial integrity 62 . Hence, in countries with high incidence of T1D and the lowest HMO-metabolising capacity gut dysbiosis and inflammation could be a causal factor in the development of T1D 18 . In industrialised countries a parallel rise in childhood obesity, which Mendelian randomisation studies show is a causal factor in T1D, which could be mediated by suboptimal dysbiotic metabolism 64 . We also note that vitamin B1 (thiamine) deficiency is a frequent metabolic complication in T1D 65 and thiamine is a co-factor of TKT 66 . Changes in gut microbiome taxa abundances and transcribed microbial pathways of T1D patients have been shown to precede thiamine deficiency 67 . Our results have mechanistic and potential therapeutic implications for a wide range of diseases associated with the same HLA class II genotypes. For example, it is possible that the commensal proteome influences the TCR repertoire of other DR15-DQ6 associated diseases such as multiple sclerosis 68 , narcolepsy 69 , lupus 70 and Alzheimer disease 71 in which this haplotype is predisposing. In Parkinson's disease the DR4-DQ8 haplotype is protective and the prodromal phase of this neurological disorder has been associated with an altered microbiota 72 . In multiple sclerosis and lupus examples of microbial mimicry have been reported 68, 73 . Interactions between a child's genetic, microbiota and gut health metabolism and diet in early life could therefore have profound predisposing or protective effects for a whole range of diseases later in life: understanding these mechanisms at the molecular, cellular and whole organism levels will be part of future primary prevention efforts for several common diseases. 9 The T1D associations with HLA class II DR and DQ alleles, individually, or in cis haplotypes or trans diplotypes, are complex 10,11,74-76 . Here we used DR-DQ diplotypes and their T1D risks 10 to investigate a possible TCR repertoire mechanism underlying their associations with T1D. A large proportion of susceptibility to, and protection from, T1D has been mapped three specific amino acid positions in the b chains of the DR and DQ molecules, DQb57 and DRb13 and DRb71 (11, 75) . For DQb57 the negatively-charged aspartic acid (D) is associated with dominant protection from T1D and neutral amino acids such as alanine (A), or serine (S) in the nonobese diabetic (NOD) mouse model of T1D 12,27,77 , encoding increased risk of the disease. The HLA-DQB1 allele, DQB1*0302, encodes the most T1D-predisposing allotype, DQ8, with A at b57, in contrast to the D 57 -positive DQ6 b chain (DQB1*0602), which is the most protective HLA class II allotype encoded by the DRB1*1501-DQB1*0602 haplotype. As can be seen in the crystal structures of DQ8 with the primary T1D autoantigenic epitope from the insulin B chain, B:9-23 (4) , and with B:11-23 and its TCR 5 , DQ8 presents the peptide to CD4 + T cells in a trimolecular interaction between it, the peptide, and the TCR CDR3b contacting the acidic N-terminal amino acids of the peptide, and pocket 9 of DQ8, containing A at b57. When the three most T1D susceptible amino acids are all present in DQ and DR molecules, as in the DRB1*0401-DQ8 haplotype, the haplotypic risk is at its highest, with individuals homozygous for this haplotype at over 32-fold increased risk of T1D 10 . Additional susceptibility is encoded in a trans interaction between the two main susceptible haplotypes, DRB1*0401-DQ8 and DRB1*03-DQB1*02 (DR3-DQ2/DR4-DQ8), where there is a strong disease-predisposing synergy, most likely due to the structure and peptide-binding properties of the trans-heterodimer of the DQ a chain from the DR3 haplotype, DQA1*0501, and the DQ8 b chain from the DR4 haplotype (DQ2aDQ8b). Hence, DQ2/DQ8 heterozygous individuals are at the highest risk of T1D, at over 63-fold increased risk of T1D 10 . DQ2 also has A at b57, and known to present proinsulin peptides, including B:9-23, and strongly predisposing to T1D 1,2,36 . Previously, we assembled a cohort of T1D families, as part of the JDRF Centre, Diabetes-Genes, Autoimmunity and Prevention Centre (D-GAP) 78 , including a collection of peripheral blood mononuclear cells (PBMCs) from blood samples from children with T1D and their unaffected siblings in order to define the immune system before and during T1D. Using this resource, we investigated possible associations between DR-DQ diplotype and the TCR repertoire by selecting donors with the most susceptible diplotypes (DQ2/DQ8) to compare with those with the protective DQ6 haplotype. In order to assess the role of DQ more specifically we also analysed PBMCs from donors who carried the D 57 -positive DQ7 allotype (DQB1*0301), which when present with the most predisposing haplotype, DR4-DQ8, reduces . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 14, 2022. ; https://doi.org/10.1101/2022.05.11.22274678 doi: medRxiv preprint DR4-DQ8 T1D risk by over 6-fold 10 ; DQ7 is not as protective against T1D as DQ6 because it often occurs on DR haplotypes where the amino acids at DRb13 and b71 are both highly predisposing to T1D. Nevertheless, the protection encoded by DQ7 is dominant over the susceptible DR molecules 11,75,76 . Table S1 ) were obtained from the JDRF D-GAP cohort which comprised T1D cases and unaffected siblings (REC Ref:08/H072025). This cohort provided PBMCs and genomic DNA samples. Genomic DNA was prepared from PBMCs or whole blood using QiaAmp DNA Blood kit (Qiagen), or phenol/chloroform extraction. Initially, we selected two groups with an equal number of donors in the two extremes of the T1D susceptibility-protection axis, namely the most protected DR15-DQ6 (Supplementary Table S2 ), the most susceptible, DR3-DQ2/DR4-DQ8 (Supplementary Table S4 ). This choice was performed using data from Taqman genotyping (Applied Biosystems) of four SNPs In each batch loaded on a single-cell Chromium V(D)J cassette (10x Genomics), wherever possible, we matched individuals for age (< 20 years old) and homozygosity for DQb57. For example, in each batch all susceptible DR3-DQ2/DR4-DQ8 individuals were homozygotes for DQb57 A, and the protected DQ6 were an equal proportion of homozygotes and heterozygotes of DQb57 D. We also included two batches of DR4/DQ8 versus DR4/DQ7 or lower risk homozygotes, providing a focus on the specific effect of DQb57 (Supplementary Table S3 ). PBMC isolation, cryopreservation, and thawing were performed as previously described 32 . PBMC isolation was carried out using Lympholyte (CEDARLANE) and were cryopreserved in heat-inactivated, filtered human AB serum (Sigma-Aldrich) and 10% DMSO (Hybri-MAX, Sigma-Aldrich) at concentration between 2 to 10 × 10 6 /ml and were stored in liquid nitrogen. PBMCs were thawed in a 37°C water bath for 2 minutes and then washed by adding 1 ml of AB serum to cells dropwise, followed by adding 10 ml of cold (4°C) X-VIVO (Lonza) containing . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 14, 2022. 10% AB serum per up to 10 × 10 6 cells, in a drop-wise fashion. PBMCs were then washed again with 10 ml of cold (4°C) X-VIVO containing 1% AB serum per 10 × 10 6 cells. CD4 + T cells were purified from thawed PBMCs using negative selection (EasySep Human CD4 + T Cell Enrichment Cocktail, STEMCELL Technologies), and washed with X-VIVO medium (Lonza) containing 5% human AB serum and resuspended at the 100,000 cells per well (96 well plate) in a final volume of 200 µl X-VIVO medium (Lonza) containing 5% human AB serum. Cells were activated with the PMA/Ionomycin (eBioscience) for 2 hours and then harvested, washed, resuspended in PBS, counted and 5,000 cells were transferred to the 10x Genomics platform for single-cell immune profiling with version 1.1 kit, which provides paired gene expression and TCR sequences. We washed CD4 + T cells in PBS with 0.04% BSA and re-suspended them at a concentration cDNA was initially amplified for 11 cycles (PCR1) to amplify the mRNA, TCR, AbSeq and Sample Tag products. The resulting PCR1 products were purified by double-sized selection using AMPure XP magnetic beads (Beckman Coulter), to separate the shorter AbSeq (~170 bp) and Sample Tag (~250bp) products from the longer mRNA (350-800bp) and TCR (600-1,000 bp) products. The purified mRNA (10 cycles) and Sample Tag (10 cycles) and TCR (15 cycles) PCR1 products were then further amplified using their respective nested PCR primer panels (PCR2) on separate reactions. The resulting mRNA, Sample Tag and TCR PCR2 products were purified by size selection. The concentration, size and integrity of the PCR . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 14, 2022. 12 products was assessed using both Qubit (High Sensitivity dsDNA kit; ThermoFisher Scientific) and the Agilent 4200 Tapestation system (High Sensitivity D1000; Agilent). The final products were normalised to 2.5 ng/µl (mRNA), 1 ng/µl (Sample Tag & AbSeq) and 0.5 ng/µl (TCR) and underwent a final round of amplification (six cycles for mRNA, Sample Tag and AbSeq; seven cycles for the TCR libraries) using indexes for Illumina sequencing to prepare the final libraries. Fresh whole-blood samples (Supplementary Table S5 We preprocessed all single-cell RNA and TCR sequence libraries separately using Cell Ranger v4.0.0 (10x Genomics) to obtain gene counts and receptor assemblies for each donor. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 14, 2022. ; 13 These were then merged into a single gene expression matrix and a single TCR database, which mapped gene counts or TCR chains to cells and donors. Subsequently, we called cells from read counts with a minimum of 300 genes expressed. We also removed genes not present in at least 50 cells to keep the expression matrix tractable. Furthermore, we applied batch-dependent cutoffs to remove outliers suspected to be cell doublets or multiplets. We also filtered cells with more than 15% of mitochondrial expression to discard those undergoing apoptosis. After data cleanup, we normalized all expression values to 10,000 reads per cell and applied a logarithmic transformation. Next, we discarded all but the top 5,000 most variable genes and regressed out differences due to sequencing depth and mitochondrial gene expression. Lastly, we aligned cells from each sample using batch-balanced nearest neighbors 82 , reduced the dimensionality 83 , called clusters 84 , and performed a multivariate differential expression 85 to find population markers. The initial run yielded two low-frequency clusters with non-CD4 + contaminants. We discarded cells mapping to these, reran all data processing steps, found another cluster with contaminants, and iterated through the same process one last time to remove another non-CD4 + cluster. This led to 12 different cell subpopulations with distinct RTE and Treg clusters (Extended Data Fig. 1 ). We filtered TCRs called by Cell Ranger to retain consensus assemblies with productive rearrangements only. Finally, we performed an inner join between gene expression and receptor assembly data using cell barcodes to obtain TCR chains paired with gene expression cluster information. Among all our hits in MGnify, Blautia caecimuris was the first entry in IGC, a large cohort assembly with estimated abundances, present in more than 10% of the individuals. IGC 91 entry MH0370_GL0036213 corresponds to MGnify entry MGYG000164756_00723, with the latter lacking a nine amino acid sequence in the N-terminus probably arising due to an alternative transcription start site prediction. EBV cells were grown to a total cell count of 1×108 in complete RPMI (R10). Spent R10 was removed, and cells were incubated with 500 µg TKT or insulin in 5 ml of R10 for 2 hours at 37°C. R10 was added to full plate volume (15 ml) and cells further incubated at 37°C for 18 hours. Cells were collected, pelleted (400 × g, 5 min) and washed with PBS. Washed cells were pelleted again and frozen at -20°C for further use. Harvested pellets were washed in PBS then lysed by mixing for 30 minutes with 2 ml of lysis buffer (0.5% (v/v) IGEPAL 630, 50 mM Tris pH 8.0, 150 mM NaCl) and one tablet Complete Protease Inhibitor Cocktail EDTA-free (Roche) per 10 ml buffer at RT. Lysate was clarified by centrifugation at 1,000 × g for 10 minutes followed by a 20,000 × g spin step for 45 mins at is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 14, 2022. ; HPLC fractions were dissolved in loading buffer and analysed by an Ultimate 3000 HPLC system coupled to a high field Q-Exactive (HFX) Orbitrap mass spectrometer (ThermoFisher Scientific). Peptides were initially trapped in loading buffer, before RP separation with a 60 min linear acetonitrile in water gradient of 2-35% across a 75 μm × 50 cm PepMap RSLC C18 EasySpray column (ThermoFisher Scientific) at a flow rate of 250 nl/min. An EasySpray source was used to ionise peptides at 2000 V, and peptide ions were introduced to the MS at an on-transfer tube temperature of 305°C. Ions were analysed by data-dependent acquisition. Initially a full-MS1 scan (120,000 resolution, 60 ms accumulation time, AGC 3×10 6 ) was followed by 20 data-dependent MS2 scans (60,000 resolution, 120 ms accumulation time, AGC 5×10 5 ), with an isolation width of 1.6 m/z and normalized HCD energy of 25%. Charge states of 2-4 were selected for fragmentation. Raw data files were analysed with PEAKS X (Bioinformatic Solutions) using a protein We excluded an N-terminal prefix and a C-terminal suffix from each CDR3 to avoid the HLA class II binding bias that constrains gene usage 35,92 . We used a large hierarchical or multi-level model (Supplementary Algorithm S1) to regress all observed k-mers using the log-transformed T1D OR due to HLA class II diplotypes 10 as an explanatory variable (Fig. 1) . This model was instantiated for each combination of CDR3 chain (α or β) and k ∈ [1, 3] . Joint estimation of all k-mers for a given chain and k-mer length has the advantage of providing better moderated predictions by partial pooling 93 . We assumed k-mer counts to be binomially distributed. We modelled the relationship between the explanatory variable and each observed k-mer count as a linear function with normallydistributed random effects, composed with a logit link function, also known as a binomialnormal model. We used weakly informative hierarchical hyperpriors. Mean of slopes and intercepts was assumed to be distributed as Normal(μ=0, σ=5). Standard deviations for slopes, intercepts and random effects were set to a Cauchy(μ=0, σ=2.5) truncated at 0. For . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 14, 2022. ; 16 additional regularization, we assumed a weak correlation between slopes and intercepts, modelled as a Lewandowski-Kurowicka-Joe(η=2) 94 . We reported fold changes and absolute differences derived from estimated regressions. These were calculated as the difference or the ratio between predicted rates for the maximum and the minimum OR in our cohort, respectively. Here, the minimum OR served as reference, i.e. was the second term in the subtraction or the denominator in the quotient. We reported LFSRs as the probability of the slope having a sign different than the median one. We used a beta-binomial model (Supplementary Algorithm S2) to estimate the proportion of D in each of the three susceptible repertoire types (Fig. 2(a) Tables S5 and S6 ). Our prior belief, derived from our previous inference on whole repertoires, was modelled with an informative prior Beta(α=50, β=1000). Observations were modelled as binomial variables k ~ Binomial(n, p), where k was the number of aspartic acid residues in a given donor out of a total n CDR3β amino acids counted, after postprocessing TCRs as indicated previously. We reported the mean and standard deviation for the posterior distribution of p. We used a linear model (Supplementary Algorithm S3) with a logit link function to regress the dependence between aspartic acid frequency and PPI reactivity ( Fig. 2(b) ). To this end, we used all donors with available PPI information (Supplementary Table S8 ) except nPOD 69, whose repertoire had just six productive CDR3β chains and was therefore excluded from further consideration. The slope and intercept were assigned weakly informative priors Normal(μ=0, σ=100) and Normal(μ=0, σ=1000), respectively. Observations were modelled as binomial variables k ~ Binomial(n, p), where k was the number of PPI reactive clonotypes in a given donor out of a total n infiltrating clonotypes measured. The latent probability p was generated by a linear model with D residue counts as an explanatory variable. We reported the posterior probability of the regression slope being greater than zero. Since all density functions were differentiable, posterior distributions were estimated by running four independent traces with a No-U-Turn Hamiltonian Monte Carlo sampler for 1,000 iterations, with half of them used as warmup and standard control parameters 95, 96 . We verified . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 14, 2022. ; https://doi.org/10.1101/2022.05.11.22274678 doi: medRxiv preprint the validity of inference by visual inspection of traces to ensure adequate mixing, checked diagnostic metrics (r̂, ESS and MCSE) and performed posterior predictive checks. We assumed potential mimotopes to be a subset of those proteins that are more similar to a given epitope of interest than what would be expected by random chance. We measured similarity as the maximum local pairwise alignment score between two protein sequences. We used a linear or affine alignment cost model defined by infinite gap penalties and a BLOSUM 80 substitution matrix 97 . We estimated the null distribution of uninteresting or non-sufficiently similar scores by drawing 10 5 random permutations of the epitope of interest and aligning these with a Smith-Waterman algorithm against all 20,375 reviewed canonical isoforms from Homo sapiens stored in the UniProt database. This yielded an empirical null distribution with more than 2×10 9 scores. We obtained a parametric approximation to the empirical null by fitting a generalized extreme value (GEV) distribution using a maximum likelihood approach (Supplementary Equation S1) and employing an interior point search filter algorithm 98 to estimate the three parameters in GEV(μ, σ, ξ). With this approximation, we derived a p-value for each alignment score of the actual query epitope against a large gut microbiome protein assembly (MGnify 39 ), with protein similarity reduced by clustering at 95% of similarity and 95% of mutual coverage and selecting one protein representative per cluster (Fig. 3) . Subsequently, we estimated the proportion of discoveries and FDRs using an empirical Bayes procedure 99,100 . To speed up convergence and to obtain a more conservative approximation, we added an additional constraint requiring the score of a perfect alignment to the query epitope to be included in the support of the GEV distribution. In case of insulin B:9-25, this value is 145. The resulting GEV shape parameter estimate was ξ=-0.042015 with a 95% confidence interval of (-0.042012, -0.042017), estimated using a set of 1,000 bootstrap samples. Without this constraint, the estimate ξ=-0.070. A negative ξ parameter is important because it indicates the GEV distribution falls within the reverse Weibull family, also known as type III domain of attraction. This family has finite support, i.e. scores beyond certain value have probability zero. This agrees with the biology of epitopes and local alignments. Scores larger than a perfect match should be impossible and scores of identical matches should have non-zero probability. However, the widely used Basic Local Alignment Search Tool (BLAST 101 ) and its implementation variants use a null approximated by a GEV from the Gumbel family or type II 102, 103 , which has a shape parameter ξ=0 and therefore support for infinite scores. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 14, 2022. ; https://doi.org/10.1101/2022.05.11.22274678 doi: medRxiv preprint 18 We have previously identified many of the same insulin mimotopes using a different generative model-based method 28 . Other approaches to the same problem have relied on human-curated results from BLAST searches performed on fully assembled genomes from cultured species 29,104 . Consequently, prioritized peptides lack statistical significance or come from organisms that do not have humans as a host and therefore do not have any clinical interest. For instance, the antigen presented in by Huang et al 29 , RILVELLYLVCSEYL, is far from significance according to our local alignments model (FDR=1) when using B:9-23 (score=68) or B:9-25 (score=68) as query epitope. In comparison, many of the bacterial sequences we have prioritized surpass insulin-like growth factor II (IGF2; score=81) and one surpasses insulin-like growth factor I (IGF1; score=90) in their similarity to insulin B:9-25. If the number of identities is used as a statistic, RILVELLYLVCSEYL again yields a non-significant result when using either B:9-23 (score=9) or B:9-25 (score=9) as a query epitope. ARG, JAT and MLP designed and co-led the study. LT discussed the results. All authors contributed to the final version. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 14, 2022 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 14, 2022. ; https://doi.org/10.1101/2022.05.11.22274678 doi: medRxiv preprint NI SK LK MA KV II IA RI VT IR SH KN VP HN IT LV GM VS AV LP RV NV PL GR GL YG QE EG TE DT YS RD YY ES DP SD DS DQ DY QQ DN DG ED QD EY EE . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 14, 2022 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 14, 2022. ; https://doi.org/10.1101/2022.05.11.22274678 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 14, 2022. Log-transformed p-values denote the significance of similarity scores for each of the~1×10 7 predicted proteins in a reference gut metagenome assembly. Data points are labelled with false discovery rate (FDR) and predicted bacterial species. The top association signals are present in proteins with a transketolase (TKT) domain. Other protein superfamilies and domains include late competence operon (ComE), hydroxyacylglutathione hydrolase (GloB), glycosyl hydrolase family 25 (GH25), cyclases/histidine kinases associated sensory extracellular domain 3 (CH3) and yqeN DNA replication protein (YqeN). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 14, 2022. ; https://doi.org/10.1101/2022.05.11.22274678 doi: medRxiv preprint 5KC T-hybridoma reporter cells expressing TCRs identified from CD4 + T cells in the islets of T1D organ donors were generated to test for the response to antigen stimulation, and activation of 5KC cells was assessed by evaluating expression of ZsGreen-1, which is driven via NFAT signaling upon stimulation. 5KC cells expressing GSE.6H9, GSE.20D11, or GSE.166H9 were cultured with the insulin B:9-23 peptide (a) and TKT peptides (b) at 100 mM in the presence of Epstein-Barr virus (EBV)-transformed autologous B cells. Cultures without peptide and with anti-CD3 antibody were included as negative and positive control, respectively. ZsGreen-1 expression was assessed by flow cytometry. (c) 5KC cells expressing the GSE.166H9 TCR and EBV-transformed autologous B cells were co-cultured with the insulin B:9-23 peptide and the TKT peptide 8 (100 mM) in the presence or absence of anti-DR, DQ, or DP antibodies, followed by evaluation of ZsGreen-1 expression by flow cytometry. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. , and (f) were repeated two (a), three (d), or four times (e and f), and mean values ± standard error of the mean are shown. ns: not significant, *P < 0.05 and **P < 0.01 (calculated by a two-tailed paired t-test). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 14, 2022. ; https://doi.org/10.1101/2022.05.11.22274678 doi: medRxiv preprint Extended Data Fig. 1 : Single-cell subpopulations in the D-GAP cohort. Gene marker rank Gene markers for single CD4 + T cell clusters 0±11, depicted in UMAP projection. Cluster 7 corresponds to recent thymic emigrants (RTEs) and cluster 8 to regulatory T cells (Tregs). * indicates no standard gene name is currently assigned, and the identifier of the corresponding genomic region has been used as a placeholder. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 14, 2022. ; https://doi.org/10.1101/2022.05.11.22274678 doi: medRxiv preprint Extended Data Fig. 3 : Estimates of CDR3β k-mer fold changes across HLA class II risk extremes for RTE and Treg cells. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 14, 2022. ; https://doi.org/10.1101/2022.05.11.22274678 doi: medRxiv preprint Extended Data Fig. 4 : Estimates of CDR3α k-mer fold changes across HLA class II risk extremes for Tconv cells. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 14, 2022. ; https://doi.org/10.1101/2022.05.11.22274678 doi: medRxiv preprint Extended Data Fig. 7 : Differentially used TCR α chain joining genes. Results filtered at FDR < 5%. Error bars represent a 95% confidence interval for the mean frequency. This comparison was made using a simpler two-group design, with nine DQ6 (Protected) and 16 DQ2/DQ8 donors (Susceptible). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Results filtered at FDR < 5%. Error bars represent a 95% confidence interval for the mean frequency. This comparison was made using a simpler two-group design, with nine DQ6 (Protected) and 16 DQ2/DQ8 donors (Susceptible). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 14, 2022. Table S1 : D-GAP sample selection and HLA class II diplotype interactions. T1D odds ratios (OR) for DR-DQ haplotype combinations correspond to log 10 -transformed estimates from UK Biobank. 10 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 14, 2022. ; https://doi.org/10.1101/2022.05.11.22274678 doi: medRxiv preprint Table S3 : D-GAP low genetic risk group. AA 1 and AA 2 encode amino acids at HLA DQB1 57, DRB1 13 and 71. 11 The DQA1*03:02 allele is unresolved to 03:02 or 03:03. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 14, 2022. ; . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 14, 2022 Table S10 : Eluted peptides from Blautia caecimuris transketolase that were measured to be bound by DQ molecules using mass spectrometry. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 14, 2022. ; https://doi.org/10.1101/2022.05.11.22274678 doi: medRxiv preprint Islet-derived CD4 T cells targeting proinsulin in human autoimmune diabetes Proinsulin-reactive CD4 T cells in the islets of type 1 diabetes organ donors Epitope specificity, cytokine production profile and diabetogenic activity of insulin-specific T cell clones isolated from NOD mice Structure of a human insulin peptide-HLA-DQ8 complex and susceptibility to type 1 diabetes How C-terminal additions to insulin B-chain fragments create superagonists for T cells in mouse and human type 1 diabetes Increased islet antigen-specific regulatory and effector CD4 + T cells in healthy individuals with the type 1 diabetes-protective haplotype T-cell epitopes and neoepitopes in type 1 diabetes: a comprehensive update and reappraisal Understanding autoimmune diabetes through the prism of the 22 Bacteroides dorei dominates gut microbiome prior to autoimmunity in Finnish children at high risk for type 1 diabetes Genetic risk for autoimmunity is associated with distinct changes in the human gut microbiome Current understanding of the role of gut dysbiosis in type 1 diabetes Microbiome and type 1 diabetes Prospects for primary prevention of type 1 diabetes by restoring a disappearing microbe Variation in microbiome LPS immunogenicity contributes to autoimmunity in humans A peptide binding motif for HLA-DQA1*0102/DQB1*0602, the class II MHC molecule associated with dominant protection in insulin-dependent diabetes mellitus The 6 year incidence of diabetes-associated autoantibodies in genetically at-risk children: the TEDDY study The CDR3 regions of an immunodominant T cell receptor dictate the 'energetic landscape' of peptide-MHC recognition MHC-II alleles shape the CDR3 repertoires of conventional and regulatory naïve CD4 + T cells Molecular constraints on CDR3 for thymic selection of MHC-restricted TCRs from a random pre-selection repertoire HLA autoimmune risk alleles restrict the hypervariable region of T cell receptors T cell receptor restriction of diabetogenic autoimmune NOD T cells Restricted islet-cell reactive T cell repertoire of early pancreatic islet infiltrates in NOD mice Common extracellular sensory domains in transmembrane receptors for diverse signal transduction pathways in Bacteria and Archaea Gut microbiota in T1DM-onset pediatric patients: machine-learning algorithms to classify microorganisms as disease linked The insulin gene is transcribed in the human thymus and transcription levels correlate with allelic variation at the INS VNTR-IDDM2 susceptibility locus for type 1 diabetes Insulin expression in human thymus is modulated by INS VNTR alleles at the IDDM2 locus Oral insulin immunotherapy in children at risk for type 1 diabetes in a randomised controlled trial Insulin gene VNTR genotype associates with frequency and phenotype of the autoimmune response to proinsulin Thymic development of gut-microbiota-specific T cells Bifidobacteria-mediated immune system imprinting early in life Key bacterial taxa and metabolic pathways affecting gut short-chain fatty acid profiles in early life Childhood body size directly increases type 1 diabetes risk based on a lifecourse Mendelian randomization approach Low thiamine levels in children with type 1 diabetes and diabetic ketoacidosis: a pilot study Thiamine pyrophosphate, a coenzyme of transketolase Integrated meta-omic analyses of the gastrointestinal tract microbiome in patients undergoing allogeneic hematopoietic stem cell transplantation HLA-DR15 Molecules jointly shape an autoreactive T cell repertoire in multiple sclerosis Autoimmunity to hypocretin and molecular mimicry to flu in type 1 narcolepsy Specific combinations of HLA-DR2 and DR3 class II haplotypes contribute graded risk for disease susceptibility and autoantibodies in human SLE A specific amino acid motif of HLA-DRB1 mediates risk and interacts with smoking history in Parkinson's disease The nasal and gut microbiome in Parkinson's disease and idiopathic rapid eye movement sleep behavior disorder Commensal orthologs of the human autoantigen Ro60 as triggers of autoimmunity in lupus HLA DR-DQ haplotypes and genotypes and type 1 diabetes risk: analysis of the type 1 diabetes genetics consortium families Widespread non-additive and interaction effects within HLA loci modulate the risk of autoimmune diseases Motifs of three HLA-DQ amino acid residues (α44, β57, β135) capture full association with the risk of type 1 diabetes in DQ2 and DQ8 children The first external domain of the nonobese diabetic mouse class II I-A beta chain is unique Blood and islet phenotypes indicate immunological heterogeneity in type 1 diabetes HIBAG-HLA genotype imputation with attribute bagging Why we (usually) don't have to worry about multiple comparisons Generating random correlation matrices based on vines and extended onion method The No-U-Turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo Stan : A probabilistic programming language Amino acid substitution matrices from protein blocks Adaptive barrier update strategies for nonlinear interior methods Controlling the false discovery rate: a practical and powerful approach to multiple testing On the adaptive control of the false discovery rate in multiple testing with independent statistics Basic local alignment search tool Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes The estimation of statistical parameters for local alignment score distributions Autoreactive T cells specific for insulin B:11-23 recognize a low-affinity peptide register in human subjects with autoimmune diabetes N} do 3: k i ∼ Binomial(n=n i , p=p) 4: end for Algorithm S2: Beta-binomial aspartic acid proportion model. 1: α ∼ Normal(µ=0, σ=100) 2: β ∼ Normal(µ=0, σ=1000) 3: for i ∈ {1