key: cord-0020835-77vnjfex
authors: Wang, Shiyu; Wang, Longlong; Liu, Yang; Zhu, Yonggang; Liu, Ya
title: Characteristics of T-cell receptor repertoire of stem cell-like memory CD4+ T cells
date: 2021-08-25
journal: PeerJ
DOI: 10.7717/peerj.11987
sha: 7466b3452fc85ee5866987e6568c7ea813fa90f5
doc_id: 20835
cord_uid: 77vnjfex

Stem cell-like memory T cells (Tscm) combine phenotypes of naïve and memory. However, it remains unclear how T cell receptor (TCR) characteristics contribute to heterogeneity in Tscm and other memory T cells. We compared the TCR-beta (TRB) repertoire characteristics of CD4+ Tscm with those of naïve and other CD4+ memory (Tm) in 16 human subjects. Compared with Tm, Tscm had an increased diversity across all stretches of TRB repertoire structure, a skewed gene usage, and a shorter length distribution of CDR3 region. These distinctions between Tscm and Tm were enlarged in top1000 abundant clonotypes. Furthermore, top1000 clonotypes in Tscm were more public than those in Tm and grouped in more clusters, implying more epitope types recognized by top1000 clonotypes in Tscm. Importantly, self-reactive clonotypes were public and enriched in Tscm rather than Tm, of type one diabetes patients. Therefore, this study highlights the unique features of Tscm different from those of other memory subsets and provides clues to understand the physiological and pathological functions of Tscm.

Memory T cells play the central role in coordinating innate and adaptive immune responses (MacLeod et al., 2009) . After encountering antigens, naïve T cells differentiate to memory subsets and terminally differentiated effector T cells. During the differentiation process, T cells acquire functions and lose self-renewal abilities (Seita & Weissman, 2010) . Following the theory of a hierarchical system of memory, naïve transits to central memory (Tcm) and effector memory (Tem) T cell subsets in turn (Fearon et al., 2006) . Tcm is relatively long-lived, and a notion considers the stem cell-like characteristics of Tcm. However, the finding of stem cell-like memory (Tscm) challenges this notion (Gattinoni et al., 2011; Lugli et al., 2013) . Tscm, a rare memory subset defined from naïve, is a long-term memory subset with self-renew and the plasticity to differentiate into other memory subsets and effector (Ahmed et al., 2016; Stemberger et al., 2009) .

Recently, observations in diseases and vaccines unveil physiological and pathogenies functions of Tscm. In mice models, deleting T cells and transplanting Tscm could rebuild the memory T cell population, including central and effector memory (Simons & Clevers, 2011) . In clinical studies, human Tscm cells from naïve precursors enrich early after hematopoietic stem cell transplantation (HSCT), and contribute to peripheral reconstitution by differentiating into effectors (Cieri et al., 2015; Roberto et al., 2015) . Stimulations with CMV, influenza vaccine and WT1 tumor antigen activate cytokine expression in part of these naïve-derived Tscm, suggesting that naïve specific to given antigens differentiate to Tscm (Roberto et al., 2015) . Furthermore, HIV (Ahmed et al., 2016) , smallpox, and yellow fever vaccines studies (Gattinoni et al., 2017) presented that antigen-specific CD8 + Tscm persisted in donors receiving vaccines after a long time, indicating that Tscm acts as a reservoir for maintaining these exogenous factors antigenspecific TCRs. In addition, CD4 + Tscm may involve in graft-versus-host disease (GVHD) and autoimmune disease. As shown by Jimbo et al. (2019) , the peripheral proportion of CD4+ Tscm increased in graft-versus-host disease (GVHD) patients compared with no GVHD patients after HSCT. An increased proportion of CD4 + or CD8 + Tscms has also been observed in patients with autoimmune disease (Jimbo et al., 2019) , such as systemic lupus erythematosus (SLE) (Lee et al., 2018) , type 1 diabetes (Vignali et al., 2018) , aplastic anaemia (Hosokawa et al., 2016) , immune thrombocytopenia (Cao et al., 2019) and rheumatoid arthritis (Cianciotti et al., 2020) . Recently, T-cell immunotherapies based on Tscm have been developed against HIV and cancer (Flynn & Gorry, 2014) . However, in addition to antigen-driven, cytokines also involve in T cell differentiation. Interleukin-7 and Interleukin-15 combined with stimulation via CD3 and CD28 facilitate the differentiation of naïve to Tscm in vitro (Cieri et al., 2013) . PD-L1 and TGF-b promote the differentiation from naïve to regular T cell (Batra et al., 2020) . IFN-β can regulate the expansion of CD4 + memory T and NK cells to facilitate the anti-tumor effects of a novel form of 4-1BBL (Barsoumian et al., 2019) . Therefore, these studies in infectious and autoimmune diseases raise interest in the specificity of TCR clonotypes enriched in Tscm and whether the TCR repertoire of Tscm is different from other memory subsets.

Individual has enormous diversity of TCR repertoire including over 10 6 clonotypes (Qi et al., 2014; Soto et al., 2020) . Complementary-determining region 3 (CDR3) is the most diverse part of TCR, and contribute to antigen recognition ability of TCR. The diversity of TCR repertoire is trimmed by inherent and exogenous factors. For naive, genetics (Gao et al., 2019; Posnett, 1995) , the rearrangements of V(D)J segments, and thymus selection (Khosravi-Maharlooei et al., 2019) shape its TCR repertoire. For memory, both genetics and environment factors trim its TCR repertoire composition (Hou et al., 2020; Krishna et al., 2020) . Previous studies suggest that antigen-specific clonotypes unevenly distributed among memory subsets. Acute infection-related clonotypes enriched in Tcm, while chronic infection-related clonotypes and autoimmunity related clonotypes maintained in Tem (Devarajan & Chen, 2013) . In COVID19 patients, CD4 + responding clonotypes were biased to be expanded in Tcm more than in Tem (Minervina et al., 2021) . Furthermore, a study using the transfer of genetically-modified virus-specific T cells showed that antigen-specific clonotypes only maintained in Tscm rather than other subsets after a long time (Roberto et al., 2015) , suggesting that composition of TCR repertoire may different among memory subsets. in addition, T-cell differentiation However, it is still difficult to conclude that memory subsets have different composition of TCR clonotypes, because of the limitations of methods for screening antigen-specific TCRs.

High-throughput sequencing of TCR repertoire (TCR-seq) has become an essential technique in immunology. Recently, this technique is used to unveil the process of TCRs' development in the thymus (Khosravi-Maharlooei et al., 2019) , to promote the understanding of positive and negative selections, and to define the disease biomarkers (Liu et al., 2019) . By TCR-seq, the differences of the TCR repertoire were shown among T cell subsets. The TCR beta chain (TRB) repertoire of CD4 + memory T cells has a shorter distribution of CDR3 length and a skewed V-gene usage, compared with that of Tn in peripheral blood. A study with three subjects shows that Tcm has a power law exponent higher than Tem (Oakes et al., 2017) . It suggests a lower clonal expansion in Tcm. A model by the power law distribution was employed to separate type one diabetes from heathy donors based on the TRB repertoire of Tscm, but not that of Tcm (Koch et al., 2018) . Furthermore, T cell receptor antigen specificity prediction methods based on the TRB CDR3 sequence have been developed recently (Zhang et al., 2020) , and clonotypes targeting the same antigens can be clustered by the TRB CDR3 sequences (Huang et al., 2020) . Therefore, analyzing TCR-seq data of Tscm and other memory subsets may provide novel perspectives for unveiling functions of Tscm.

We analyzed the repertoire features of the TRB repertoire in Tscm and Tm, including sequence composition (k-mer), gene segments, the TRB repertoire structure and CDR3 length distribution. We then unveil the differences of the antigen specificity between Tscm and Tm. We trained a SVM model with a large dataset (Emerson et al., 2017) to identify the public clonotypes in each sample, and showed that public clonotypes within top1000 abundant clonotypes in Tscm were more than those in Tm. The public clonotypes in Tscm have a different sequence composition comparing with public clonotypes in Tn. It confirms that the high abundant, public clonotypes in Tscm are antigen-experienced. We further used a sequence-based method to cluster clonotypes targeting same antigens, and showed that the public clonotypes in Tscm could recognize more antigens than those in other memory T cell subsets. Finally, we found more presence of similar clonotypes to those found in database and recognized autoreactive antigens in type one diabetes (T1D) patients.

In this study, we conducted analyses on high-throughput TCR repertoire datasets of CD4 + T cell subsets. Gomez-Tourino et al. (2017) used a stringent strategy to sort Tn (CD3 + CD4 + CD45RO − CD27 + CCR7 + CD95 − ), Tscm (CD3 + CD4 + CD45RO − CD27 + CCR7 + CD95 + ), and Tm (CD3 + CD4 + CD45RO + CD27 + ) from eight healthy subjects (HD) and eight T1D patients by fluorescence-activated cell sorting (FACS). Then RNA was extracted and sequenced in parallel. The sequence data is immuneACCESS format (https://clients.adaptivebiotech.com/pub/peakman-2017-naturecommunications). We examined the number of clonotypes in T1D and HD. As shown in Fig. S1 , no significant difference was presented between T1D and HD in any subset.

To generating a model to identify public clonotypes which can occur in more than two individuals, we used datasets from Emerson et al. (2017) for training and testing a support vector machine (SVM) model. The dataset includes data of two cohorts, and can be found at https://clients.adaptivebiotech.com/pub/emerson-2017-natgen.

Statistical analyses were performed with R. The paired Willcox-ranked test was used to examine the difference between two groups. The Kruskal-Wallis rank-sum test was used to examine the differences among multiple groups, and then Nemenyi test was used for multiple comparisons. The p values of multiple tests were corrected by false discovery rate (FDR) method. A test with a p value < 0.05 was considered as a significance. The Spearman correlation method was used to examine the correlation between samples of two groups. Graphics were generated with R package ggplot2. Principal component analysis (PCA) was conducted with R package forcats. R package readr, dplyr and tidyr were used for statistics.

A clonotype was defined as the amino acid sequence identity of the TRB CDR3 region.

Renyi entropy was used in our study to evaluate the diversity with alpha value from 0 to 20. When alpha increases, clonotypes with a higher frequency will have a greater influence on the entropy. When alpha equal 0, the Renyi entropy is the logarithm of the number of clonotypes; When alpha is 1, Renyi entropy tends to the Shannon entropy. When the alpha approaches infinity, the Renyi entropy is determined by the most frequent clonotype, where a lower frequency of the most frequent clonotype will generate a higher Renyi entropy index. Renyi entropy formula is

where H is the diversity index, α is the alpha value, n is the total number of clonotypes, and f i is the frequency of the ith clonotype.

We performed 'complete linkage' clustering algorithm on the correlation matrix, and visualized dendrograms using pheatmap, a R package [50] . The Euclidean distance was used as a distance metric. Pearson correlation method was used to measure the correlation of TRB repertoire structures.

SVM analysis was performed using kernel-based analysis of biological sequences with the R package KeBABS (Palme, Hochreiter & Bodenhofer, 2015) . Amino acid sequence of clonotypes was split into features with length k = 3. A cost parameter C = 100 was used for the misclassification of a sequence. A total of 320,000 public and 320,000 private clonotypes were randomly sampled from the total set, and then were split into training (80%) and test (20%) sets. SVM training was performed on the training set, and class prediction was performed on the test set. Prediction accuracy of classification was qualified by calculating BACC = 1 2 × spec + sens , where specificity was calculated as spec = TN TN +FP , and sensitivity was defined as sens = TP TP+FN , (where TN = true negative, FP = false negative, TP = true positive and FN = false negative). The area under the receiver operating characteristic curve (AUC) was calculated, where the AUC = 0.5 means a random classification (BACC = 50%), and AUC = 1 means a perfect classification (BACC = 100%).

The dataset (Emerson et al., 2017) includes two cohorts: cohort 1 includes 666 individuals, and cohort 2 includes 116 individuals. We termed clonotypes occurred in no less than two individuals as public clones and ones occurred in only one individual as private clonotypes. Data of cohort 1 was used as a training set. For this dataset of cohort 1, we randomly sampled 20,000, 40,000, 80,000, 160,000 and 320,000 public clonotypes and an equal number of private clonotypes, train a SVM model, and tested the model by cross validation (Fig. S2 ). When sampling more than 160,000 public clonotypes, the increase in the sample size had limited improvement in model accuracy. To save computing source, we trained a model on the 320,000 public clonotypes and 320,000 private clonotypes. The model was then validated with data of cohort2. In order to further increase prediction accuracy, we tested two thresholds for public clonotype definition: (1) definition of public clonotypes occurred in at least three subjects; (2) definition of public clonotypes occurred in at least two subjects. Comparing to the second threshold, the definition of public clonotypes occurred in at least three subjects elevated the prediction accuracy from 82% to 88%. We therefore defined public clonotypes occurred in at least three subjects.

To characterize immune repertoire clonal structure, we used Renyi entropy (Greiff et al., 2015) . The α-values represent weights, which means as α increases, higher frequency clonotypes are weighted more. For a given alpha value, a larger Renyi entropy means a more considerable diversity of the sample. Since each alpha value focuses on a different stretch of the immune repertoire, this method enabled the reliable capture of TRB repertoire clonal frequency distributions. Our results showed that as alpha value increased, the Renyi entropy of all memory subsets decreased. At all alpha values, the Renyi entropy of Tn was higher than that of both Tscm and Tm. At the alpha value from 0 to 2, the Renyi entropy of Tscm was less than that of Tm, while after alpha value 3, the Renyi entropy of Tscm was greater than that of Tm (Fig. 1A) , which reflects that Tscm had a greater diversity than Tm in the abundant clonotypes. Since the Renyi entropy profiling can recover a large amount of immunodiagnostic fingerprints from TRB repertoire data, we used the Renyi entropy to classify the cell subsets with hierarchical clustering approach based on Pearson correlation (Greiff et al., 2015) . The results showed that Tscm of 13 subjects were clustered together; Tm of 15 subjects were gathered; Tscm of only 3 subjects mixed in the cluster of Tm (Fig. 1B) . This hierarchical clustering result suggested that Tscm and Tm had different TRB repertoire structures.

The gene usage of the TRB repertoire of memory T cells is heavily affected by antigen experience. We analyzed the gene usage to unveil the effects of antigen-experience on Tscm and Tm, respectively. Since the frequent clonotypes of memory T cells are expanded by chronic antigen stimulations, we also analyzed the gene usage of the top1000 abundant clonotypes. The PCA on the gene usage of the entire repertoire separated Tn, Tscm, and Tm from each other ( Fig. 2A) . Specially, the PCA on the genes of the top1000 clonotypes could achieve a better performance of classification (Fig. 2B) . A further analysis on top1000 abundant clonotypes showed that Tscm and Tm differentially used 12 V-genes and 3 J-genes: the frequency of TRBV12-03, TRBV07-09, TRBV18-01, TRBV23-01 and TRBJ02-07 were less in Tm than in Tscm; the frequency of TRBV03-01, TRBV02-01, TRBV11-02, TRBV09-01, TRBV06-05, TRBV25-01, TRBV24-01, TRBV05-05, TRBJ01-02 and TRBJ02-02 was greater in Tm than in Tscm (Table S1 ). In further, we used Spearman correlation to quantify the similarity of the gene usage between Tscm and Tm. Because Tscm and Tm differentiate from Tn, we therefore used the correlation between Tn and Tscm as a contrast. For the entire TRB repertoire, the correlation between Tm and Tscm was similar to that between Tscm and Tn, but less than that between Tm and Tn (Fig. 2C) ; for the top 1000 abundant clonotypes, the correlation between Tscm and Tm was greater than that between Tscm and Tn (Fig. 2D ). It suggests that Tscm and Tm had a large difference in the gene usage of the TRB repertoire, especially in the range of top1000 abundant clonotypes. 

The antigen experience has a selection on clonotypes which may change the distribution of CDR3 length. For the CDR3 of the total clonotypes, Tscm was significantly longer than Tn, and Tm as well (Fig. 3A) . For the CDR3 of the top1000 abundant clonotypes, Tscm was obviously shorter than Tm, but longer than Tn. (Fig. 3A) . According to the Spearman correlation analysis, Tscm had a similar length distribution to Tm rather than to Tn in the top1000 abundant clonotypes (Fig. 3B ). Tscm and Tm also showed an increased correlation in gene usage, then we examined whether gene usage was related to the distribution of CDR3 length in Tscm and Tm. However, we did not find a high correlation (cor = 0.27) between the gene usage and distribution of CDR3 length in Tscm and Tm (Fig. 3C) , suggesting that, between Tscm and Tm, the increased correlation of gene usage and elevated correlation of length distribution might be independent. 

We examined the CDR3 sequence composition by decomposing kernels containing three amino acids. As showed by the PCA on the sequence composition of all clonotypes, Tn, Tm, and Tsm samples were partially separated (Fig. 4A) . Exhibited by the PCA based on the k-mer of the top1000 abundant clonotypes, the samples of Tn, Tscm, and Tm were completely separated (Fig. 4B) . These results suggested that Tscm and Tm had a great difference in the sequence composition of the top1000 clonotypes. We used Spearman correlation to quantify differences of sequence composition among subsets. For the total TRB repertoire, the correlation between Tm and Tscm was significantly weaker than that between Tn and Tm, and slightly weaker than that between Tscm and Tn (Fig. 4C) . For the top1000 abundant clonotypes, the correlation between Tm and Tscm was significantly weaker than that between Tn and Tscm, and between Tn and Tm as well (Fig. 4D) . Furthermore, to identify whether the correlation between Tscm and Tm reduced in the top1000 clonotypes, we randomly sampled 1,000 clonotypes from the entire repertoire as a contrast. Our results showed that the correlation coefficient of the top1000 clonotypes between Tscm and Tm was significantly lower than that of random subsamples. In contrast, the correlation of top1000 clonotypes sequence composition between Tn and Tscm, and that between Tn and Tm were significantly stronger than that of random samples between corresponding cell subsets (Fig. 4D ). It suggested that Tscm and Tm were different in the sequence composition of the entire TRB repertoire, especially the top1000 abundant clonotypes.

Public clonotypes which are shared among subjects can be induced by gene recombination and antigen stimulation (Venturi et al., 2008) . To evaluate the public clonotypes' distribution in each sample, we trained a SVM model to identify public clonotypes with a large dataset (Greiff et al., 2017) . This dataset includes two cohorts. The SVM model had a high prediction accuracy (BACC = 88% and AUC = 95%) in cohort 1 and presented The PCA of the sequence composition (k-mer) of the top1000 abundant clonotypes of Tn, Tscm, and Tm. Each dot represents one sample from a subject, each ellipse shows a 95% confidence ellipse, and the centroid presents the mean of PC1 as well as PC2 of samples in a cluster. (C) The Spearman correlation coefficient of sequence composition of the entire clonotypes between Tn and Tscm, between Tm and Tn, and between Tm and Tscm. The median, the first, and third quartiles were shown. (D) The Spearman correlation coefficient of sequence composition of both top1000 abundant clonotypes (blue) and randomly sampled 1000 clonotypes (red) between Tn and Tscm, between Tm and Tn, and between Tm and Tscm. The median, the first, and third quartiles were shown. For C and D, the paired Wilcox-paired ranked test was used, and then the p values were corrected by the FDR method. Full-size DOI: 10.7717/peerj.11987/ fig-4 a strong robustness when classifying public clonotypes on the data of cohort 2 (BACC = 82%) (Fig. S3) . Using the SVM model, we identified public clonotypes from samples of Tn, Tscm, Tcm, and Tem. Since the public clonotypes may be unevenly distributed in different frequency ranges of the TRB repertoire, we calculated the percentage of public clonotypes for each frequency range. We ranked clonotypes by their frequency. Our result showed that as the ranks increased, the percentage of public clonotypes decreased, which was inconsistent with findings in global T cells (Soto et al., 2020) . Notably, percentage of identified public top1000 clonotypes in Tscm were more than the percentage in Tm. (Fig. 5A) . We also used the data of top1000 clonotypes of effector memory T cells and central memory T cells presented by Jiang et al. (2020) , and found a similar trend that Tscm contained more public top1000 clonotypes than effector memory T cells (Tem) as well as central memory T cells (Tcm). (Fig. 5B) . We further trained a SVM model to classify public clonotypes of Tn and Tm, and showed a prediction accuracy of 66%, which suggested that public clonotypes of Tn and public clonotypes of Tscm contained predictive high-dimensional features.

To further identify the antigen specificity of clonotypes in memory subsets, we used GLIPH2 to cluster clonotypes potentially targeting same antigens. A stringent cutoff was used to avoid potential mistakes when performed GLIPH2. In GLIPH2, a clonotype can be grouped in over one cluster. For top1000 abundant clonotypes, 1095 clusters were exhibited in Tn, 176 exhibited in Tscm, and 71 exhibited in Tm. For public clonotypes within top1000 clonotypes, we defined 541 clusters in Tn, 117 in Tscm, 67 in Tm (Tables S2  and S3 ).

Since Tscm potentially play roles in infections and autoimmune diseases, we evaluated the pathogen and autoreactive clonotypes in Tscm and Tm, respectively. The clonotypes were annotated by a database including 1,885 pathogen-related clonotypes from VDJdb (Bagaev et al., 2020) , 1,409 autoreactive clonotypes (Gomez-Tourino et al., 2017) . Our results showed that autoreactive, private clonotypes could be found in Tscm of 11 subjects, and in Tm of 3 subjects; autoreactive, public clonotypes were shown in Tscm of 16 subjects, and in Tm of 16 subjects. The number of autoreactive clonotypes were similar in Tscm and Tm, however, there were more autoreactive public clonotypes than autoreactive private clonotypes in each sample (Fig. 6A) . Considering clonotypes' frequency in each subject, we showed that GAD-related clonotypes within Tscm became more frequent in T1D than in HD. In contrast, the frequency of GAD-related clonotypes in Tm was similar in T1D and HD (Fig. 6B) . Meanwhile, we showed more pathogen-related public clonotypes than pathogen-related private clonotypes for Tscm and Tm (Fig. 6C) . However, we did not find that pathogen-related clonotypes of either Tscm or Tm were more frequent in T1D (Fig. 6D) . In conclusion, self-reactivated clonotypes expanded in Tscm of T1D patients.

Heterogeneous phenotypic and functions make it controversial to conclude that Tscm represent a stable subset (Cieri et al., 2015; Sallusto & Lanzavecchia, 2009) . Studies in HSCT and on estimating Tscm turn-over rate indicated that Tscm were maintained by dynamic-influx from naïve. However, which clonotypes are selectively expanded during the differentiation from naïve is still unclear. Meanwhile, the dynamic of Tcm and Tem were also tracked in mice model, and about half of Tcm and Tem were long-lived and self-renewed, and the other part can be supplied by differentiation from naïve directly (Gossel et al., 2017) . Thus, a question is raised whether the TCR repertoire of Tscm and other memory subsets are similar. In this study, we analyzed TRB repertoire data of Tscm from a previous study. Tscm presented different features of TRB clonotype compared with other memory subsets. These features included TRB repertoire structure, gene usage, CDR3 length, and sequence composition. In previous studies, top1000 clonotypes were different from those of naïve, which was considering as a result of antigen-driven selection.

In this study, we found that top1000 clonotypes of Tscm have more overlap with naïve than other memory subsets. Furthermore, these differences between Tscm and Tm were greater than inter-individual differences within Tscm, implying that these TRB repertoire features could distinguish CD4 + Tscm and other CD4 + memory T subsets. Previously, Hou et al. (2020) reported that CD4+ memory T cells have a shortening distribution of CDR3 length and skewed gene usage compared with CD4 + naive cells. Hou et al. inferred that antigen experience leads the TRB repertoire of memory T cells skewed. Bilate et al. (2020) found that intestinal CD4 + intraepithelial lymphocytes differentiation was always accompanied by clonal restriction, suggesting that TCR and local antigen presentation is required by differentiation of this subset of T cells. Meyer-Olson et al. (2010) found that in HIV infections, TCR/peptide major histocompatibility complex interaction played a central role in the differentiation of TemRA cells (one type of effector memory T cells), but not in the differentiation of TemRO. Therefore, antigen experience affects T cell subsets' differentiation differently, which lets TCR repertoire features divergent among T-cell subsets. This mechanism might explain the differences in TRB repertoire between Tscm and other CD4 + memory T cells. Besides, differentiation sequential is another factor influencing the TRB repertoire and is supported by some previous studies (Minervina et al., 2021; Minervina et al., 2020; Snook, Kim & Williams, 2018) . Hogan et al. (2019) inferred that early exposure to self and environmental antigens establishes persistent memory populations at levels mainly determined by the dirtiness of the environment. After the first few weeks of life, new memory cells replace these populations at rates independent of the environment. Tcm showed a larger proportion replacement than Tem, which implied that Tcm was affected by naïve cells more than Tem. Our result exhibited that TRB repertoire features of Tscm, rather than Tm, are more like those of naïve, suggesting that subset with a lower differentiation would be affected by naïve subset more. This notion is consistent with the hierarchical model of human T cell differentiation (Gattinoni et al., 2011) .

Renyi entropy is a classical method for estimating all stretches of a community. Greiff et al. introduced this method to grab features of TCR repertoire and verified that TCR repertoire structure features captured by this method could separate individuals with different healthy statuses (Greiff et al., 2015) . In this study, our data suggested that Tscm were separated from Tm, whether individuals did or did not get T1D, suggesting that a large difference in TCR repertoire structure between Tscm and other memory subsets during cell differentiation. When alpha is 1, Renyi entropy tends to the Shannon entropy. Therefore, Shannon entropy, a index used in many studies, was contained in the estimation by Renyi entropy.

Public clonotypes are defined as ones that appeared in more than one individual. The traditional method for identifying public clonotypes needs a large cohort. Greiff et al. (2017) found that public clonotypes could be separated by the SVM method from private ones. Therefore, we trained an SVM model based on a large cohort and then used this model to identify public clonotypes in a small sample size. A limitation of the original study was that the model's accuracy for identifying public TCR clonotypes was about 76% and needed to be improved. We found that a large cohort (Emerson et al., 2017) and increased the threshold (termed public clonotypes as ones occurred in more than two individuals) for identifying public clonotypes could increase the model's accuracy (BBAC over 85%). Public clonotypes can be generated during gene recombination (Carey et al., 2017) and thymus selection (Khosravi-Maharlooei et al., 2019) . Public clonotypes enriched in naïve are likely promiscuous in their peptide-binding (Carey et al., 2017) , while ones in memory are more specific. Tscm had more public clonotypes than other memory T cell subsets shown in this study. This study showed that the sequence composition of public clonotypes is slightly different from those in naïve, with an SVM prediction accuracy of 66%. It suggested that public clonotypes in Tscm and Tn might have different sequence composition, and implied that public clonotypes in Tscm might undergo a selection driven by antigen stimulation. The results of the GLIPG2 analysis further verified this hypothesis. In this analysis, public clonotypes from top1000 clonotypes of Tn had 1095 clusters, while Tscm only had 176 clusters. It suggests that public clonotypes in Tscm had an increased specificity than those in Tn, which is a feature of antigen-experienced TRB repertoire.

Two Methods were involved in this study for estimating epitope specificity of clonotypes in subsets: GLIPH2 and estimation of enrichment of self-reactive clonotypes. GLIPH2 is based on a hypothesis that the sequence composition of TCR, especially the CDR3 region, determine the antigen specificity of clonotype. This model is trained on TCR clonotypes with known specificities. Huang et al. (2020) verified that GLIPH2 model could precisely cluster clonotypes recognizing the same epitope under a stringent threshold. Therefore, number of clusters predicted in our studies could reflect the number of potential epitopes recognized by clonotypes. As shown in our results, clusters of Tscm was less than naïve, but more than other memory subsets, suggesting that Tscm still maintained ability to recognize diverse antigens after antigen-specific stimulation. The other method was counting the frequency of self-reactive clonotypes in each individual. Because CDR3 primarily affected the antigen specificity of clonotype, accounting the frequency and percentage of self-reactive clonotypes identified in previous studies could reflect the functions of memory subsets. Infection and autoimmune related clonotypes in T1D and healthy groups were estimated respectively. The results confirmed the expansion of self-reactive clonotypes in Tscm rather than in other memory subsets in T1D patients.

The different TRB repertoire composition indicated a different function of Tscm compared with other memory T cell subsets. As shown by Greiff et al. (2017) , public clones are usually generated by specific inserts and/or deletions (indel) during somatic hypermutation. Shown by other studies as well as our results, during cell differentiation, public clonotypes reduced in memory and effector. Although the cellular mechanism underlying this phenomenon is still unclear, the selection on TCR clonotypes during differentiation were wildly observed. Our study showed that top1000 clonotypes were more public in Tscm than in other memory subsets, suggesting that a characteristic of Tscm is enriching public clonotypes. Furthermore, self-reactive clonotypes were defined as 'public' in our study, suggesting that abnormal indels happened in these self-reactive clonotypes make them 'public'. Therefore, the 'public' feature contributed to the enrichment of self-reactive clonotypes in Tscm. Roberto et al. (2015) introduced that self/tumor-antigen specific TCRs were enriched in Tscm after HSCT, and Interluekin-7 was essential for Tscm generation (Cieri et al., 2013; Cieri et al., 2015) . Thus, environment of Tscm generation might Tscm enrich 'public' clonotypes. The increased level of Tscm were also found in other studies focus on autoimmune diseases. For example, a study confirms that CD4 + Tscm recognizing citrullinated epitopes were expanded in rheumatoid arthritis (Cianciotti et al., 2020) . The frequency of CD8 + Tscm was increased in acquired aplastic anemia (Hosokawa et al., 2016) . Autoreactive CD8 + Tscm occurs in T1D (Vignali et al., 2018) . Our results indicated the expansion of autoreactive clonotypes in T1D. These observations suggested that Tscm might serve as a pool for autoreactive clonotypes. Although the factors to lead the enrichment of self-reactive clonotypes in Tscm is still unclear, our data showed that it might be possible to amiable T1D by targeting Tscm in T1D patients. As shown by Ahmed et al. (2016) , Tscm highly expressed Ki67, indicating an activated status of Tscm. Therefore, therapies suppressing immune activation may be possible to attenuate the pathology of T1D.

This work was supported by BGI-Shenzhen, China National GeneBank (CNGB), Science, Technology and Innovation Commission of Shenzhen Municipality under grant No. JCYJ20170817145845968 and JCYJ20170412152916724, and Shenzhen Key Laboratory of Single-Cell Omics (NO. ZDSYS20190902093613831). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

The following grant information was disclosed by the authors: BGI-Shenzhen, China National GeneBank (CNGB). Science, Technology and Innovation Commission of Shenzhen Municipality: JCYJ20170817145845968, JCYJ20170412152916724. Shenzhen Key Laboratory of Single-Cell Omics: ZDSYS20190902093613831.

Shiyu Wang, Longlong Wang, Yang Liu and Ya Liu are employed by BGI-Shenzhen.

• Shiyu Wang conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.

• Longlong Wang performed the experiments, prepared figures and/or tables, and approved the final draft.

• Yang Liu analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.

• Yonggang Zhu and Ya Liu conceived and designed the experiments, authored or reviewed drafts of the paper, and approved the final draft.

The following information was supplied regarding data availability: Data is available at immuneACCESS: B7C88S, B7001Z. https://clients.adaptivebiotech.com/pub/peakman-2017-naturecommunications. https://clients.adaptivebiotech.com/pub/emerson-2017-natgen.

Human stem cell-like memory T cells are maintained in a state of dynamic flux

VDJdb in 2019: database extension, new analysis infrastructure and a T-cell receptor motif compendium

T Role of naive-derived T memory stem cells in T-cell reconstitution following allogeneic transplantation

Heterogeneity of CD4+ memory T cells: functional modules for tailored immunity

Hematopoietic stem cell: self-renewal versus differentiation

Strategies for homeostatic stem cell self-renewal in adult tissues

TCR signal strength controls the differentiation of CD4 effector and memory T cells

High frequency of shared clonotypes in human T cell receptor repertoires

Stem cell-like plasticity of naïve and distinct memory CD8+ T cell subsets

The molecular basis for public T-cell responses?

Detection and characterization of CD8 autoreactive memory stem T Cells in patients with type 1 diabetes

Investigation of antigen-specific T-cell receptor clusters in human cancers

Supplemental information for this article can be found online at http://dx.doi.org/10.7717/ peerj.11987#supplemental-information.