key: cord-0784416-efb1ilv0 authors: Wang, Guangyu; Wang, Yongsi; Jiang, Shaofeng; Fan, Wentao; Mo, Chune; Gong, Weiwei; Chen, Hui; He, Dan; Huang, Jinqing; Ou, Minglin; Hou, Xianliang title: Comprehensive analysis of TCR repertoire of COVID-19 patients in different infected stage date: 2022-05-14 journal: Genes Genomics DOI: 10.1007/s13258-022-01261-w sha: 52d676c2aa1b468443c8ad45c1c678f05d687c26 doc_id: 784416 cord_uid: efb1ilv0 BACKGROUND: The current pandemic of coronavirus disease 2019 (COVID-19), transmitted person-to-person by the severe acute respiratory syndrome of coronavirus 2 (SARS-CoV-2), poses a threat to global public health. OBJECTIVE: In this study, we performed the comprehensive analysis of the T cell receptor (TCR) repertoire may contribute to a more in-depth understanding of the pathogenesis of COVID-19. METHODS: A comprehensive immunological analysis was performed to explore the features of the TCR repertoire and identified TCR sequences correlated with SARS-CoV-2 viral antigens. RESULTS: we analyzed the COVID-19 patients’ TCR repertoires in peripheral blood mononuclear cells (PBMC) which obtained before (baseline), during (acute), and after rehabilitation (convalescent) by ImmunoSEQ-technology, and found that repertoire features of TCRβ-chain (TCRβ) complementary-determining region 3 (CDR3) in COVID-19 patients were remarkable difference, including decreased TCR diversity, abnormal CDR3 length, difference of TRBV/J gene usage and higher TCR sequence overlap. Besides, we identified some COVID-19 disease-associated TCRβ clones, and the abundance of them changed with the progression of the disease. Importantly, these disease-associated TCRβ clones could be used to distinguish COVID-19 patients from healthy controls with high accuracy. CONCLUSIONS: We provide a clear understanding of the TCR repertoire of COVID-19 patients, which lays the foundation for better diagnosis and treatment of COVID-19 patients. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s13258-022-01261-w. COVID-19 is a potentially severe disease caused by SARS-CoV-2. As of March 30, 2021, SARS-CoV-2 has affected 128,339,981 people, resulting in 2,806,701 million people deaths worldwide. COVID-19 not only endangers people's health but also affects the economy (Bolay et al. 2020 ). Clinically, COVID-19 presents with influenza-like symptoms of headache, fever, dry cough, diarrhea, fatigue, pneumonia, conjunctivitis, and gastrointestinal symptoms such as diarrhoea (Abobaker et al. 2020; Berthelot et al. 2020; Burhan et al. 2020) . Although the efforts and resources spent by clinicians and scientists trying to explore antiviral drugs and vaccines (Lundstrom 2020; Sharma et al. 2020; Wiersinga et al. 2020) , there is still no effective clinical treatment for COVID-19 to date. Therefore, there is an urgent need to better understand the host immune response in SARS-CoV-2 virus infection that will offer fundamental insights into the treatment and design effective therapeutic interventions for the disease. In order to mount effective adaptive responses against pathogens, the human adaptive immune system harbors a vast array of T-cell receptors (Hou et al. 2019a, b, c) . Highthroughput sequencing (HTS) of T cell receptors is a useful tool for analyses of T lymphocytes' clonality and diversity (Hou et al. 2019a, b, c) . In this study, we performed a comprehensive immunological analysis to explore the extent to which SARS-CoV-2 influences the TCR repertoire, including diversity of TCR, CDR3 frequency distribution, and CDR3 length distribution, V/J usage, rearrangement sites, and overlap indices. Meanwhile, we identified a list of TCR sequences that may be correlated with SARS-CoV-2 viral antigens in COVID-19 patients, which could be used to distinguish COVID-19 patients from healthy controls accurately. These results provided a link between COVID-19 and TCR repertoires and helped prevent the occurrence or progression of COVID-19 disease with timely diagnosis and treatment. All TCR sequences used here evaluation was downloaded from immuneaccess (https:// clien ts. adapt ivebi otech. com/ pub/ covid-2020) (Nolan et al. 2020) . As described previously, this is an exploratory study of unselected eligible participants baseline to, infected with, or recovering from COVID-19. Whole blood samples were collected under the INCOVE project at Providence St. Joseph Health (Seattle, WA), and patients were enrolled during the active phase and monitored through disease. This study includes T-cell repertoire data from 139 samples, including 60, 65, 14 samples collected from baseline, acute, and convalescent stages, respectively. COVID-19 patients aged 23 to 89 years (mean ± SD, 60.30 ± 17.54) were consented and enrolled via a virtual study design. This study was approved by the Western Institutional Review Board (WIRB reference number 1-1,281,891-1, Protocol ADAP-006). All the participating individuals provided their written informed consent. In addition, the TCR sequences for 21 healthy subjects were obtained from Adaptive Biotechnologies immuneACCESS siIte (https:// doi. org/ 10. 21417/ B7SG6T). These healthy subjects were tested negative for anti-HIV antibodies, anti-hepatitis B surface antigen (anti-HBsAg) antibodies and exhibited no laboratory or clinical signs of other immunological disorders or infectious diseases. Among these 21 healthy donors, ten were males and eleven were females, and had a mean age of 56.61 ± 16.34 years. Whole blood samples blood were collected in EDTA tubes and sent to isolated peripheral blood mononuclear cells (PBMC) and DNA extracted for TCRβ analysis via the immunoSEQ (Nolan et al. 2020) . In briefly, extracted genomic DNA was amplified using a bias-controlled multiplex-PCR system, followed by high-throughput sequencing. Raw data processing and analysis were performed through immunoSEQ. Then, demultiplexed reads were processed to reduce amplification and sequencing bias. Subsequently, we used the international ImMunoGeneTics information system (IMGT; http:// imgt. cines. fr) database for the alignment. According to the criteria of the international ImMunoGe-neTics Collaboration, the TCRβ CDR3 region starts at the second conserved cysteine encoded by the 3' position of the Vβ gene segment and ends with the conserved phenylanine encoded by the 5' position of the Jβ gene segment. The number of amino acid between these codons determines the CDR3 length. According to the identity of each sequence after alignment, the relative abundance of each TCRβ CDR3 sequence was clear and calculated. The percent frequency of each distinct DNA sequence, amino acid sequence and V-J combination was also identified. Additionally, we used batch correction to eliminate the batch effect of different datasets. Subsequently, multiple TCR data statistics were performed included TCR repertoire diversity, V/J usage, V-J pairing, CDR3 length distribution, and the length distribution of n1Insertion, vDeletion, d5Deletion, d3Deletion, jDeletion, n2Insertion were identified as previously described (Gomez-Tourino et al. 2017; Hou et al. 2019a, b, c; Huang et al. 2019 ). The immune repertoire was characterized by examining the diversity and clonality. Analysis of the immune repertoire involves several diversity indexes, such as the D50 index, Shannon index, Simpson index and Gini index (Hou et al. 2020; Marcon et al. 2014; Venturi et al. 2007 ). For each of the 6 rearrangement sites (Vdel, D5del, D3del, Jdel, n1ins, and n2ins) the frequency of clonotypes bearing a particular number of indels was calculated for cells of each samples. The indel patterns are shown classified into germline, short, medium and long. Indels were classifified into length of the relevant rearrangement event. A cumulative frequency was calculated for each of classes and rearrangement sites. Interindividual sharing TCR repertoires were quantifified by calculating the number of common amino acid sequences in each sharing category (overlap (X, Y) =|X and Y|/min (|X|, |Y|) for amino acid sequences). Receiver operating curve (ROC) analysis was performed to evaluate the value of disease-associated TCRβ clones to differentiate between cases and controls. Disease associated clones were defined as those TCRβ presented in at least four COVID-19 patients and less than three healthy individuals. Exhausted leave-one-out cross-validation was used to assess the identifier's performance during model training. More specifically, given that there were N samples in each group, 4/5 N samples were used as training data and processed using the above classification model. The remaining 1/5 N samples were used as testing data to perform the classification. The cross-validations were repeated 5 N times until every sample had been used as testing data five times (Huang et al. 2019) . The area under the ROC curve (AUC) values were calculated using the predicted probability of a given TCRβ chain belonging to the COVID-19 population. The parameter in which the highest AUC value was determined, and the points where the sum of sensitivity and specificity levels reached the maximum were accepted as the best cut-of values (Huang et al. 2019 ). For comparisons of two groups, statistical analyses were conducted using the Mann-Whitney U-test or unpaired t-test. Data were presented as the mean ± SD values or as percentages (%). Statistical analyses were performed using SPSS20. The TCRβ repertoires of different groups (baseline, acute, and convalescent, and healthy controls) were examined. Diversity indices were calculated, consisting of the D50 index, Shannon index, Simpson index, and inversesimpson index (Fig. 1) . For the Simpson index, the smaller the index was, the higher diversity had. Besides, for inversesimpson index, D50 index, and Shannon index, an increased value indicates greater diversity. According to the Fig. 1 , there was a lower TCR diversity in COVID-19 patients. It is worth noting that the diversity of the convalescent group was lower than that of the baseline group and the acute group. The reason for this phenomenon may be that the TCR diversity of convalescent group had not convalescent during this period. TCR gene rearrangement generates a diversity of T lymphocytes by V(D)J recombination. We identified disease-related changes of TRBV and TRBVJ gene usage in COVID-19 patients (Fig. 2) . The results showed that there were more differentially expressed TRBV and TRBJ gene subfamilies in baseline group, and acute group, and less in the convalescent group compared with the healthy control group. The results showed that TRBV10, and TRBV2 exhibited significantly lower expression in COVID-19 patients, while TRBV12, TRBV20, TRBV24, TRBV3, and TRBV9 showed higher expression in COVID-19 patients compared with that in healthy controls, whether in the baseline, acute, and convalescent group (Fig. 2a-c) . Besides, TRBV6 was lower expressed, and TRBV15, TRBV23, and TRBV29 were higher expressed in the baseline and acute group, no difference in the convalescent group. Regarding the degree of TRBJ usage, TRBJ2-5 showed higher usage, while TRBJ1-3 and TRBJ1-4 showed significantly lower usage in COVID-19 patients when compared to healthy control, whether in the baseline, acute, and convalescent group (Fig. 2d-f ). Moreover, TRBJ1-2 and TRBJ1-6 showed significantly higher usage, while TRBJ1-1 and TRBJ2-6 showed significantly lower usage in the baseline and acute group, no difference in the convalescent group. In addition, COVID-19 patients could be distinguished from healthy controls by a principal coordinate analysis (PCA) based on TRBV segment usage (Fig. 2g) , indicating that TRBV gene expression changes were involved in the pathogenesis of COVID-19. Moreover, we further analyzed the differential expression TRBJ gene subfamilies among the three disease groups (baseline, acute, and convalescent). TRBJ1-3 and TRBJ1-4 were significantly higher expression, TRBJ2-2 were significantly lower expression in baseline and acute group, compared with a convalescent group (Fig. 3a, b) . In addition, three-dimensional landscapes of TRBV gene usage in conjunction with TRBJ usage were plotted. We found the top three V-J combinations of covid-19 patients (including baseline, acute, and convalescent group) were TRBV7-TRBJ2-7, TRBV5-TRBJ2-7, TRBV6-TRBJ2-7 (Fig. 3c-e) . However, the top three V-J combinations in healthy controls were TRBV6-TRBJ1-1, TRBV5-TRBJ1-1,TRBV6-TRBJ2-7 (Fig. S1 ). The analysis of CDR3 length variation might reveal the structure-function relationships of different TCRs, recognizing specific antigens (Hou et al. 2016) . Importantly, we found there were significant differences of TCRβ CDR3 length distribution in COVID-19 patients compared with that in HCs (Fig. 4a-c) , whether in the baseline, acute or convalescent group. Moreover, there were significant differences among the three different infection stages of COVID-19 patients (Fig. S2 ). CDR3 diversity is generated by the extensive genomic rearrangement that takes place among TRBV, TRBD, and TRBJ gene segments and by the nucleotide insertions and deletions (InDels) within the junction of Fig. 1 The diversity of TCRβ repertoire in COVID-19 patients from baseline, acute, and convalescent group, and healthy controls (HC) are measured by the Shannon index (a), D50 index (b), inverses-impson index (c), and Simpson index (d). Data are presented as the mean ± SD values and compared using the t-test. *P < 0.05, **P < 0.01, ***P < 0.001(two-tailed) . Therefore, to understand the molecular basis for these features, we analyzed the Indel nucleotides in the 6 rearrangement sites (Videos, Jdels, D3dels, D3dels, n1ins, and n2ins), and found that length distribution of the six recombination events (indels) was significantly different between COVID-19 patients and healthy control, whether in the baseline (Fig. S3) , acute ( Fig. 4d-f, Fig. S4 ) or convalescent group (Fig. S5 ). The differences were mainly concentrated in germline (no deletions/insertions) and long d5dels and Jdels. Moreover, there were significant differences among the three different infection stages of COVID-19 patients (Figs. 4g-i, S6 ). TCRβ clonotypes that are shared between individuals are thought to play a vital role in the efficacy of pathogenspecific responses and infection control (Zhao et al. 2016) . We found the overlap indices of TCRβ CDR3 amino acid sequences were substantially higher in COVID-19 patients than that in HCs (Fig. 5a-c) . However, the overlap index was similar among the three different infection stages of COVID-19 patients (Fig. S7) . Notably, we found 522, 777, 1022 amino acid sequences shared by 80% of the COVID-19 patients from the baseline group, acute group, and convalescent group, respectively. Besides, sharing TCR repertoires between any two samples in each group was also quantified ( Fig. 5d-f ). Finally, we intended to identify the COVID-19 diseaserelated specific clones and used them to construct the ROC curve to distinguish COVID-19 patients from healthy controls. We screened out 15, 15, 30 TCRβ CDR3 clones that were abundantly and widely presented in the COVID-19 patients from the baseline, acute, or convalescent group, respectively, but rarely appear in the healthy controls. We defined these clones as COVID-19 disease-associated TCRβ CDR3 clones, and found that the unique and total diseaseassociated TCRβ clones frequencies were significantly higher in COVID-19 patients than in healthy controls (Figs. Fig. 3 Usage frequency of TRBV/J segments was different among the three different infected stages of COVID-19 disease. a, b Significant differences of TRBJ gene usage between baseline and convalescent group (a), Significant differences of TRBJ gene usage between acute and convalescent group (b). Data were statistically analyzed by t-test and FDR correction. Three-dimensional landscapes depicting the mean utilization frequency of specific Vβ-Jβ gene segment combinations in the TCRβ chains expressed in patients with COVID-19 from the baseline (c), acute (d), and convalescent (e) group S8, 6a-c). In the baseline group, these 15 disease-related TCRβ clones could separate COVID-19 patients from healthy control with a sensitivity of 95.58%, a specificity of 89.32%, and an accuracy of 96.67% (AUC (the area under the ROC curve) = 63.2%) (Fig. 6d) . The same analysis was carried out in the acute and convalescent group, with a sensitivity of 96.28%, a specificity of 92.26% and an accuracy of 96.83% (AUC = 63.34%) in the acute group (Fig. 6e) , and a sensitivity of 78%, a specificity of 95.98% and an accuracy of 92.0% (AUC = 72.84%) in convalescent group (Fig. 6f ). In addition, we identified some TCRβ CDR3 clones which changed with different infection stage. Among these, the abundance of 14 clones increased with the disease's progression (baseline-acute-convalescent) (Fig. 7a) . The abundance of 13 clones increased to a peak from the baseline stage to the acute stage, and then decreased from the acute stage to the convalescent stage (Fig. 7b) . Altered CDR3 length distribution and InDel patterns in COVID-19 patients. a-c Altered CDR3 length in COVID-19 patients from the baseline group (a), acute group (b), and convalescent group (c), compared with healthy controls. d-f TCRβ CDR3 sequences from acute COVID-19 patients and healthy controls were analyzed for the number of insertions and deletions (InDels) at each of the three rearrangement sites, D3del (d), D5del (e), Jdel (f). The deletions were classified as short, medium or long depending on their frequency in the overall distribution for each of the 3 rearrangement sites. G: germline (no deletions/insertions). Short: 1-11 (Jdel), 1-5 (D5del, D3del). Medium: 6-9 (D5del), 6-8 (D3del), 12-18 (Jdel). Long: 10-11 (D5del), 9-10 (D3del), 19-25 (Jdel). g-i TCRβ CDR3 sequences of COVID-19 patients from baseline and acute group were analyzed for the number of InDels at each of the three rearrangement sites, D3del (g), D5del (h), Jdel (i) COVID-19 became a severe pandemic, widespread in the world. As of March 30, 2021, the number of global infections is estimated to be over 128 million, with over 2.8 million deaths. The adaptive immune responses are essential for SARS-CoV-2 virus clearance. Understanding humoral responses to SARS-CoV-2 is critical for improving diagnostics, therapeutics, and vaccines (Du and Yuan 2020; Shrock et al. 2020) . In this study, we used ImmunoSEQ-technology to characterize the TCR repertoire features in PBMCs of COVID-19. Results indicated that the patients with COVID-19 showed lower diversity than healthy controls, although the difference was not statistically significant. Moreover, our analysis results TRBV/J gene exhibited significantly different expression in COVID-19 patients. The results may be associated with immune-related response to virus antigen. In response to disease-related antigen stimulation, T cells become activated and undergo clonal expansion. However, the total number of immune cells generally remains within a relatively stable range to maintain its normal physiological functions. The increased number of dominant T cells leads to a decreased number of other T cells (Sui et al. 2015) . As a result, COVID-19 patients had relative increases in the proportion of some TRBV(J) gene segments and decreased others. TCR CDR3 loops can vary in both length and sequence, and this gives them the ability to recognize a diverse set of antigens (Afik et al. 2017; Pickman et al. 2013) . Indeed, we found that COVID-19 patients tend to select longer clonotypes compared with healthy controls. Recombination events generate diversity in antigen recognition in the CDR3 region of the TCR. We also analyzed the deletions or insertions of 6 rearrangement sites, and results showed that the Indel lengths distribution was different between COVID-19 patients and healthy controls. Different rearrangements may result in longer CDR3 lengths in COVID-19 patients. More generally, the higher proportion of shared clones among abundant clonotypes could result from peripheral selection and expansion of antigen-specific clonotypes driven by persistent pathogens (Chandran et al. 2015) . We found that the overlap degree of the TCR repertoire was significantly higher in COVID-19 patients. It may be that SARS-CoV-2 viruses foster disease-associated public repertoire formation. If so, specific public sequences may prove useful for the longitudinal monitoring of immune responses during COVID-19 diseases. In simian immunodeficiency virus (SIV)-infected rhesus macaques, the emergence of public Gag-specific clonotypes correlates with protection, and the absolute number of shared clonotypes inversely correlates with viral load (Price et al. 2009 ). So, disease-associated public TCRs may explain certain infections' susceptibility, correlations of pathogenesis, and clinical manifestations of the disease. Besides, some specific public sequences can be applied toward developing novel molecular biomarkers potentially useful for accurate disease classification. Huang et al. identified a set of TCRβ and IGH clones, which can be used to distinguish IgAN from NIgAN and healthy controls with high accuracy (Huang et al. 2019) . In this study, we also identified a set of SARS-CoV-2 associated TCRβ clones which can distinguish patients with COVID-19 from healthy control with an accuracy rate of more than 92%. However, the main limitation of our study is the relatively low sample number. Hence, to estimate whether the biomarker performs well, validation studies Fig. 6 Disease-associated TCRβ clones can distinguish COVID-19 patients with healthy control. a-c The total frequency of diseaseassociated TCRβ clones are significantly higher in PBMC of COVID-19 patients from baseline group (a), acute group (b), and convalescent group (c) than that in healthy controls. d-f The ROC curve of classification of healthy controls and COVID-19 patients from baseline group (d), acute group (e), and convalescent group (f) according to disease-associated TCRβ clones. Data are presented as the mean ± SD values and compared using the t-test. ***indicate P values less than 0.001 Fig. 7 The heat map showed the changes in the expression level of disease-associated TCRβ CDR3 clones during different stages of infection. The abundance of 14 clones was increased in COVID-19 patients from baseline to acute stage to convalescent stage (a). The abundance of 13 clones was increased in COVID-19 patients from baseline stage to acute stage, and then decrease from acute stage to convalescent stage (b) with large numbers of cases in other ethnic populations are required. In conclusion, we present a comprehensive overview of TCRβ CDR3 repertoire in COVID-19 patients. These analyses provide information to distinguish COVID-19 patients and healthy controls. More importantly, this provided a basis for using it as a new biomarker in the future. The online version contains supplementary material available at https:// doi. org/ 10. 1007/ s13258-022-01261-w. Extrapulmonary and atypical clinical presentations of COVID-19 Targeted reconstruction of T cell receptor sequence from single cell RNA-seq links CDR3 length to T cell differentiation state Lymphocyte Changes in Severe COVID-19: Delayed Over-Activation of STING? COVID-19 is a Real Headache Dharmawan I (2020) Clinical progression of COVID-19 patient with extended incubation period, delayed RT-PCR time-to-positivity, and potential role of chest CT-scan Persistence of CTL clones targeting melanocyte differentiation antigens was insufficient to mediate significant melanoma regression in humans Mathematical modeling of interaction between innate and adaptive immune responses in COVID-19 and implications for viral pathogenesis T cell receptor beta-chains display abnormal shortening and repertoire sharing in type 1 diabetes Current status and recent advances of next generation sequencing techniques in immunological repertoire TCRbeta repertoire of memory T cell reveals potential role for Escherichia coli in the pathogenesis of primary biliary cholangitis No difference in TCRbeta repertoire of CD4+ naive T cell between patients with primary biliary cholangitis and healthy control subjects Shorter TCR beta-chains are highly enriched during thymic selection and antigen-driven selection Preselection TCR repertoire predicts CD4(+) and CD8(+) T-cell differentiation state The landscape and diagnostic potential of T and B cell repertoire in Immunoglobulin A Nephropathy Coronavirus pandemic-therapy and vaccines Generalization of the partitioning of shannon diversity A large-scale database of T-cell receptor beta (TCRbeta) sequences and binding associations from natural and synthetic exposure to SARS-CoV-2 BCR CDR3 length distributions differ between blood and spleen and between old and young patients, and TCR distributions can be used to detect myelodysplastic syndrome Public clonotype usage identifies protective Gag-specific CD8+ T cell responses in SIV infection A review of the progress and challenges of developing a vaccine for COVID-19 Viral epitope profiling of COVID-19 patients reveals cross-reactivity and correlates of severity Composition and variation analysis of the TCR beta-chain CDR3 repertoire in systemic lupus erythematosus using high-throughput sequencing Methods for comparing the diversity of samples of the T cell receptor repertoire Pathophysiology, transmission, diagnosis, and treatment of coronavirus disease 2019 (COVID-19): a review Preferential use of public TCR during autoimmune encephalomyelitis Data availability All sequencing data have been deposited and made public in the Adaptive Biotechnologies immuneACCESS siIte (https:// clien ts. adapt ivebi otech. com/ pub/ covid-2020). All other data are available from the authors upon reasonable requests. Conflict of interest All authors declare that they have no conflict of interest.