key: cord-0319661-ulfnaijj authors: Greissl, J.; Pesesky, M.; Dalai, S. C.; Rebman, A. W.; Soloski, M. J.; Horn, E. J.; Dines, J. N.; Gittelman, R. M.; Snyder, T. M.; Emerson, R. O.; Meeds, E.; Manley, T.; Kaplan, I. M.; Baldo, L.; Carlson, J. M.; Robins, H. S.; Aucott, J. N. title: Immunosequencing of the T-cell receptor repertoire reveals signatures specific for diagnosis and characterization of early Lyme disease date: 2021-08-02 journal: nan DOI: 10.1101/2021.07.30.21261353 sha: a865b8dae8ec23981df8f2ef425f5050a1f0e908 doc_id: 319661 cord_uid: ulfnaijj Lyme disease, the most common tick-borne illness in the United States, is most frequently caused by infection with Borrelia burgdorferi. Although early antibiotic treatment can prevent development of severe illness and late manifestations, diagnosis is challenging in patients who do not present with a typical erythema migrans rash. To support a diagnosis of Lyme disease in such cases, guidelines recommend 2-tiered serologic testing. However, 2-tiered testing has numerous limitations, including ambiguity in interpretation and lower sensitivity in early disease. We developed a diagnostic approach for Lyme disease based on the T-cell response to B. burgdorferi infection by immunosequencing T-cell receptor (TCR) repertoires in blood samples from 3 independent cohorts of patients with laboratory-confirmed or clinically diagnosed early Lyme disease, as well as endemic and non-endemic controls. We identified 251 public, Lyme-associated TCRs that were used to train a classifier for detection of early Lyme disease with 99% specificity. In a validation cohort of individuals with early Lyme disease, TCR testing demonstrated a 1.9-fold increase in sensitivity compared to standard 2-tiered testing (STTT; 56% versus 30%), with a 3.1-fold increase <=4 days from the onset of symptoms (44% versus 14%). TCR positivity predicted subsequent seroconversion in 37% of initially STTT-negative patients, suggesting that the T-cell response is detectable before the humoral response. While positivity for both tests declined after treatment, greater declines in posttreatment sensitivity were observed for STTT compared to TCR testing. Higher TCR scores were associated with clinical measures of disease severity, including abnormal liver function test results, disseminated rash, and number of symptoms. A subset of Lyme-associated TCRs mapped to B. burgdorferi antigens, demonstrating high specificity of a TCR immunosequencing approach. These results support the clinical utility of T-cell-based testing as a sensitive and specific diagnostic for early Lyme disease, particularly in the initial days of illness. Lyme disease, the most common tick-borne illness in the United States (U.S.), has an estimated incidence of >450,000 new cases annually (1) (2) (3) . In the U.S., Lyme disease is caused by infection with the spirochetal bacterium Borrelia burgdorferi (or rarely B. mayonii) transmitted from infected Ixodes ticks (4-6). Lyme disease is also among the most widespread tick-borne diseases worldwide, although in Eurasia, B. afzelii and B. garinii (in addition to B. burgdorferi) commonly cause infection (1, 6) . In the days to weeks after the initial tick bite, early symptoms may include a characteristic erythema migrans (EM) rash and nonspecific flu-like symptoms. Individuals presenting at later stages may exhibit signs and symptoms of disseminated infection affecting the joints, nervous system, or heart (7). The clinical manifestations of disseminated infection vary based on the infecting Borrelia species and region, with joint-related symptoms being more common in North America and severe neurological and chronic skin manifestations occurring more frequently in Eurasia (1, (8) (9) (10) . Potentially debilitating late manifestations include arthritis, encephalopathy, encephalomyelitis, peripheral neuropathy, or acrodermatitis chronica atrophicans. Early diagnosis and treatment of Lyme disease has been shown to prevent severe illness and the development of late objective manifestations of disease (4, 11). Given the high clinical index of suspicion of EM rash for Lyme disease and the poor sensitivity and specificity of currently available diagnostic assays, patients with EM rash can be treated immediately without further testing (4). However, while the CDC reports that the majority of patients develop an EM rash, Lyme-associated rashes can be mistaken for other conditions or . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) have Lyme disease with early seronegativity (21, 23, 26-31). Diagnostic interpretation is also complicated by the predictive value of serology testing, which is much lower in non-endemic regions where the pre-test probability of Lyme disease is low (20, 32). False-negative results have been reported to occur in patients with early antibiotic treatment or concurrent bacterial infections that can decrease antibody responses to Borrelia or block IgG seroconversion (32, 33). A small percentage of patients with Lyme disease may also test seronegative due to development of a cellular response in the absence of a humoral response (32). Furthermore, because serology assays cannot distinguish between active and past infections, a subset of the 5% to 10% of symptomatic individuals in endemic areas who test positive by STTT may in fact have an unrelated illness (24, 32, 34). These data underscore a number of unmet clinical needs for novel methods that can facilitate more sensitive and specific diagnosis of Lyme disease, especially in the early stages of infection. Diagnostic tests based on the cellular immune response can address some of the limitations of serology-based testing, as infection with B. burgdorferi has been shown to elicit a T-cell response that may exhibit different kinetics than the humoral response (35, 36). Evaluation of cytokine/chemokine profiles suggests that an active T-cell response is induced during the acute phase of infection, even in the absence of seroconversion, and returns to normal levels after treatment and symptom resolution (37). In individuals with persistent symptoms, Th1 responses in affected tissues have been linked to pathogenic inflammation (38-40). In contrast, humoral responses vary widely, with some cases demonstrating attenuated responses and lack of IgM to IgG seroconversion, and other cases demonstrating antibody persistence for decades . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) (32, 33, 41) . These data suggest that assays interrogating the T-cell response may have utility for aiding in the diagnosis of Lyme disease during early illness, as well as late manifestations. High-throughput sequencing of the T-cell receptor (TCR) repertoire can be used to identify disease-specific TCR sequences expressed by T-cell clones that have undergone antigen-driven expansion and persist in the memory compartment. While the diversity of TCR recombination means that most TCR responses are "private" and infrequently observed in other individuals, part of the T-cell response to a disease is "public," with identical amino acid sequences observed across multiple individuals, particularly those with shared HLA backgrounds (42). Such disease-associated TCRs can be identified using a case/control design, as previously described for cytomegalovirus (CMV) (43) and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) (44) viral infections, and matched to specific antigens through multiplex identification of antigen-specific T-cell receptors (MIRA) (43). Because these public clones are antigen-and HLAspecific, they serve as a signature of infection in a given HLA context (43, 44). We have previously shown that classifiers based on quantification of disease-associated public TCR sequences can be used for sensitive identification of past infection with CMV (43) or SARS-CoV-2 (44). Such classifiers leverage the relative frequency of disease-associated sequences within the repertoire, measures that have been shown to be associated with disease severity in the setting of SARS-CoV-2 (45, 46). Clinical validation of a T-cell assay for SARS-CoV-2 utilizing a similar methodology demonstrated high positive percent agreement (>94.5%) and negative percent agreement (~100%) with reverse transcriptase PCR (RT-PCR) for detection of past SARS-CoV-2 infection (46, 47). However, this approach has not previously been applied to bacterial disease. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. In the present study, we describe an approach for measuring the T-cell adaptive immune response in early Lyme disease using TCRβ sequencing from blood samples. We analyzed blood samples from three independent cohorts of patients with laboratory-confirmed and/or clinically . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2021. ; https://doi.org/10.1101/2021.07.30.21261353 doi: medRxiv preprint To identify public TCRs associated with early Lyme disease, we designed a case/control training dataset consisting of patients identified from the LDB and Boca Biolistics cohorts who presented with STTT-positive early Lyme disease prior to 2019 (n = 72) and control repertoires (n = 2981) from a database of healthy individuals from endemic and non-endemic regions recruited for other studies and presumed to be negative for early Lyme disease (Fig. 1A) . Table S1 summarizes the cohorts who provided samples used in this study. Public, Lyme diseaseassociated TCRs, referred to as "enhanced sequences," were identified primarily based on statistical enrichment in cases, as described in the Methods. Overall, we identified 251 enhanced sequences associated with early Lyme disease. Enhanced TCR sequences are highly specific for identifying early Lyme disease As previously observed in viral infections (43, 44), comparison of the numbers of enhanced sequences and total unique productive TCR rearrangements among case and control samples suggests that the total number of disease-associated enhanced sequences in a repertoire is a highly specific biomarker for Lyme disease (Fig. 1A) . To leverage this biomarker as a diagnostic classifier, we modeled the number of enhanced sequences as a logistic-growth function of the number of unique productive TCRs sampled from a repertoire and fit this model to the 2981 control repertoires in the training data ( Fig. 1A ; see black line representing the model fit in Fig. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 2, 2021. ; https://doi.org/10.1101/2021.07.30.21261353 doi: medRxiv preprint 1A). The resulting model compares the number of observed versus expected enhanced sequences in a repertoire, given the number of observed unique TCRs, quantified as the number of standard deviations from the expected value (red dashed lines, Fig. 1A ). This approach carefully controls specificity by considering thousands of control repertoires. The final positive/negative call threshold was set to a specificity of 99% on an independent set of endemic control samples (n = 2627) ( Fig. 1B ; Table S1 ). To confirm the specificity and generalizability of the classifier and call threshold, we applied the resulting model to a holdout set of samples from the LDB cohort collected in 2019 that included both laboratory-confirmed positive (by STTT, PCR, and/or culture) cases of early Lyme and laboratory-confirmed negative (by STTT) endemic controls with no history of Lyme or tick-borne infection (Fig. 1C) . Overall TCR repertoire analysis is more sensitive than STTT for identifying early Lyme disease and frequently precedes STTT seroconversion To further evaluate the performance of the TCR assay, we validated the TCR classifier using samples from STTT-positive and STTT-negative patients with clinically diagnosed early Lyme disease enrolled in the JHU cohort. Application of the TCR classifier revealed that median TCR model scores for patients with early Lyme disease were higher than those of individuals defined as endemic controls based on cohort-specific criteria ( Fig. 2A ; Table S1 ). Overall, 118 of 211 (56%) patients diagnosed with early Lyme disease were classified as TCR-positive (Table 1) . By . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 2, 2021. ; comparison, 64 of 211 (30%) patients were STTT-positive, indicating that use of the TCR assay nearly doubled the number of clinical Lyme disease cases identified as positive (1.9-fold increase in sensitivity). Only 32 of 2631 (1.2%) endemic control samples tested TCR-positive (0 of 115 in LDB; 1 of 45 in JHU; and 31 of 2471 repertoires from our database from individuals with unknown Lyme disease status living in Lyme-endemic regions in the US and Europe; Fig. 2A ). Of note, PCR testing results for blood and/or skin biopsy were available for a subgroup of 57 individuals in the JHU cohort; of those, 12 individuals tested negative by both STTT and PCR, raising the possibility that these individuals had a non-Lyme tick-borne illness, such as Southern tick-associated rash illness (STARI) (48). The majority (11/12) of these individuals were TCRnegative; excluding them from the analysis did not appreciably alter performance characteristics (sensitivities of 59% versus 32% for TCR assay and STTT, respectively). The sensitivity of both TCR testing and STTT were lower in early illness and increased with days since symptom onset (Fig. 2B) . While the sensitivity of TCR testing was greater than STTT at all periods of illness evaluated, the greatest performance advantage of TCR testing was observed within the first week after symptom onset, when STTT sensitivity was below 30% (≤4 days: 44% versus 14%; 5 to 8 days: 57% versus 29%; >8 days: 68% versus 51%, for TCR and STTT testing, respectively). These data indicate that TCR sequencing can identify early Lyme disease with significantly greater sensitivity than standard antibody-based testing, particularly in the initial days of acute illness, while also maintaining high specificity. We next compared the agreement of TCR and STTT results, showing that TCR testing was positive in 59 of 64 (92%) STTT-positive cases (58 of 61 [95%] STTT-positive by IgM), as well as 59 of 147 (40%) STTT-negative cases (Table 1) . Of the 59 TCR-positive/STTT-negative individuals, . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 2, 2021. ; https://doi.org/10.1101/2021.07.30.21261353 doi: medRxiv preprint 22 (37%) subsequently seroconverted between study enrollment and the first posttreatment follow-up visit (~3 weeks after enrollment), while only 16 of 88 (18%) individuals who were TCRnegative at baseline seroconverted over the same time period (P=0.01, Fisher's exact test). Stratification of the JHU cohort by initially STTT-positive, posttreatment seroconverter, or persistent STTT-negative demonstrated that median TCR model scores (Fig. 2C ) and classifier sensitivity (Fig. 2D , Table 1 ) were highest among individuals who were STTT-positive at enrollment, intermediate among those who seroconverted posttreatment, and lowest among individuals who remained persistently STTT-negative. Taken together, these data indicate that while the presence of a detectable T-cell response is strongly correlated with a detectable antibody response, earlier maturation of the T-cell response may allow for enhanced sensitivity of a TCR-based diagnostic during early phases of B. burgdorferi infection. Previous data suggest that the dynamics of T-cell and humoral immune responses differ in B. burgdorferi infection (36). To better understand the dynamics of the T-cell response posttreatment, we evaluated TCR repertoires in longitudinal samples from individuals enrolled in the JHU cohort. Patients initiated 3 weeks of oral doxycycline treatment within ±72 hours of enrollment, with samples collected at enrollment, immediately after treatment (~3 weeks after enrollment), and 6 months posttreatment. Immunosequencing of samples collected during these timepoints (n = 161 patients with available samples at all timepoints) revealed that TCR responses waned significantly in the 6 months following treatment (Fig. 3) , differing from our . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 2, 2021. ; https://doi.org/10.1101/2021.07.30.21261353 doi: medRxiv preprint previous observations characterizing the T-cell response in patients infected with SARS-CoV-2 (46). Median model scores decreased from 6.1 to 2.5, and model sensitivity decreased from 56% (91/161) at enrollment to 32% (51/161) 6 months posttreatment. Notably, the sensitivity of STTT also declined over the same time period, from 33% at enrollment to 12% at 6 months posttreatment (14 of 115 patients with available STTT results ), consistent with previous reports indicating that IgG seroconversion is often absent among IgM-positive individuals treated early in infection (33). Similar to the results shown in Fig. 2C , we observed that TCR model scores were higher across all timepoints among individuals who at baseline were STTT-positive compared those who were STTT-negative (Fig. 3) . The strong correlation observed between antibody and T-cell responses highlights the interconnectedness of the immune response in early Lyme disease, and may also reflect underlying pathogen burden, disease severity, or other clinical measures that drive the immune response. We therefore explored potential associations between clinical parameters previously reported in the JHU study (49) and the strength of the T-cell response as measured by the TCR model score at diagnosis. In both univariate analyses (Fig. 4) and a multiple regression model (Table S2 ) that adjusted for sex, age, and serostatus, higher TCRs scores were associated with markers of disease severity, including elevated liver function tests, disseminated rash, and the number of Lyme disease-associated symptoms. Sex, age, size of rash, and lymphocyte count were not associated with a difference in TCR model scores in this cohort (Table S2) . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 2, 2021. ; https://doi.org/10.1101/2021.07.30.21261353 doi: medRxiv preprint To evaluate the potential breadth of antigens detected by TCRs included in our Lyme classifier, we clustered enhanced sequences by sequence similarity, as described in the Methods. We identified 6 clusters of at least 5 sequences each, which together accounted for 105 of the 251 (42%) enhanced sequences (Table 2) . Notably, in 5 of 6 clusters, statistical assignment of individual TCRs to HLA subtypes resulted in a consistent HLA assignment for the cluster, supporting the conclusion that clustered TCRs react to the same antigen and providing a putative HLA restriction for that antigen ( Table 2 ). All assigned HLA subtypes were class II heterodimers, consistent with the prediction that T-cell responses to bacterial antigens will be predominantly HLA-II-restricted CD4+ T cells. These analyses suggest that >40% of the enhanced TCR sequences included in our Lyme classifier recognize one of 6 specific HLA-restricted peptides. To further characterize the antigen specificity of Lyme-associated TCRs, we used MIRA to identify target TCR epitopes. We first synthesized 777 query peptides derived from 26 B. burgdorferi proteins and assigned either individual peptides or groups of related peptides to one of 426 unique MIRA pools, or "addresses," as described in the Methods. MIRA was then performed on T cells derived from peripheral blood mononuclear cells (PBMCs) collected from 395 healthy individuals using a version of the assay that selects for HLA-II-restricted CD4+ T cells. One cluster (Table 2, cluster 6) contained 6 TCR sequences that all mapped to the same antigen by MIRA (MIINHNTSAINASRNNG from the B. burgdorferi flagellin B (FlaB) protein, . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 2, 2021. ; https://doi.org/10.1101/2021.07.30.21261353 doi: medRxiv preprint WP_002661938.1). Each of these 6 sequences was found in at least 3 individuals (range, 3 to 25) in the MIRA analysis, and at least 1 of these sequences was found in 26 individuals. All 25 of the 26 individuals who were HLA-typed expressed HLA-DRB3*02:02, the HLA associated with cluster 6. By comparison, the background expression frequency of HLA-DRB3*02:02 was 14% among 394 HLA-typed individuals assessed by MIRA. Selective recognition of a B. burgdorferi antigen by these TCRs is also consistent with immunosequencing results showing that they were observed in 21% of JHU cases, but only 5% of holdout endemic controls (Fig. 5, FlaB (A) ). Sensitive detection of an immune response targeting FlaB differentiates TCR testing from STTT, as antibodies to the FlaB protein used for immunoblotting in STTT are known to have low specificity (50). By applying MIRA-based antigen assignment to enhanced sequences that were not clustered (but that were present in ≥2 individuals assessed by MIRA), we were able to assign antigens for 3 additional TCRs (Table S3 ). One TCR mapped to the same FlaB antigen as cluster 6 and was observed in MIRA experiments from 15 individuals. This sequence also had the same V gene, J gene, CDR3 length, and HLA association as the members of cluster 6, but did not meet our conservative clustering threshold due to a difference of 2 amino acids in the CDR3 region. Another enhanced sequence mapped to a different antigen from FlaB (SSGYRINRASDDAAGMG) and was found in 3 individuals by MIRA. This enhanced sequence was also associated with HLA-DRB3*02:02 in the JHU cohort, and all 3 individuals from the MIRA analysis expressed this HLA. The third MIRA-assigned enhanced sequence mapped to DbpA in 4 individuals and could not be assigned an HLA based on our data. Each of these enhanced sequences was present in a higher proportion of JHU cases than endemic controls (Fig. 5) . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 2, 2021. ; https://doi.org/10.1101/2021.07.30.21261353 doi: medRxiv preprint Finally, to evaluate the specificity of our approach for mapping Lyme disease-specific TCRs to specific B. burgdorferi antigens, we compared the Lyme-associated enhanced sequences to a set of TCRs from 507 individuals that were previously mapped to 325 SARS-CoV-2 antigen pools by MIRA (44, 51), once again limiting the analysis to TCRs identified in ≥2 distinct individuals. We identified no matches between SARS-CoV-2-associated TCRs and Lyme-associated enhanced sequences. Collectively, these results demonstrate the functional relevance of the TCRs included in the Lyme classifier and confirm the specificity of our approach for identifying Lyme disease. We describe an approach for diagnosis of early Lyme disease from blood samples based on high-throughput TCR sequencing. Identification of 251 Lyme-associated enhanced TCR sequences served as the basis for training a classifier capable of sensitive and specific detection of Lyme disease across 3 independent cohorts of patients with laboratory-confirmed and/or clinically diagnosed early Lyme disease (LDB, Boca, and JHU). Validation of the classifier demonstrated that the T-cell assay identifies patients with early Lyme disease with a 1.9-fold improvement in sensitivity compared to STTT (56% versus 30%), while maintaining a specificity of 99%. Enhanced sensitivity was most apparent in early illness (44% versus 14%, or 3.1-fold increase in sensitivity ≤4 days since symptom onset), and TCR positivity was predictive of subsequent STTT seroconversion in 37% of initially STTT-negative individuals. T-cell testing was also more sensitive than STTT for identification of Lyme disease posttreatment (32% versus 12% . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2021. ; https://doi.org/10.1101/2021.07.30.21261353 doi: medRxiv preprint at 6 months posttreatment), including in patients who did not undergo IgG seroconversion between acute and convalescent samples. Higher TCR scores were associated with clinical measures of disease, including elevated liver function tests, disseminated rash, and number of disease-associated symptoms. Finally, we demonstrate that a subset of the identified Lymeassociated TCRs map to known B. burgdorferi antigens, supporting the high biologic specificity of a TCR immunosequencing approach. Results from this study highlight the potential utility of a T-cell-based diagnostic for identification of Lyme disease during early stages of infection. A recent review comparing the sensitivity of 2-tiered testing algorithms in patients with early Lyme disease reported ranges of 25% to 50% for STTT (21), which together with our data, suggest that the sensitivity of a TCRbased diagnostic is greater than that of STTT in early-stage disease. In addition, the ability of the TCR assay to identify Lyme disease in a large proportion of STTT-negative individuals prior to seroconversion indicates that the TCR response may be detectable before the humoral response by any serologic testing modality. These data imply that T-cell activation precedes and may be required for some aspects of the humoral response, although both T-cell-dependent and independent responses have been implicated in clearance of Borrelia infection (52, 53). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2021. ; In addition to establishing an unambiguous and easily interpretable cutoff for positivity, the TCR assay score may also serve as a semi-quantitative proxy for disease activity. Correlation of TCR scores with clinical measures of disease, such as number of reported symptoms and disseminated rash, suggests that the magnitude of the T-cell response is associated with the degree of symptomatology. Furthermore, longitudinal analyses show that the TCR score decreases with time posttreatment, consistent with diminishment of the T-cell response with resolution of disease. However, while TCR positivity was associated with STTT positivity at enrollment (92%), the T-cell response did not decline as rapidly as serologic responses following treatment. This observation, as well as the increased sensitivity of the assay over STTT, suggests that TCR testing may be able to identify Lyme disease even in the absence of acute (IgM) to convalescent (IgG) seroconversion, a common occurrence among individuals treated early in the course of disease (33), and supports the role of T-cell-based testing as both an alternative and complementary method for diagnosis. Importantly, 11 of 12 individuals who presented with EM rash, but were both STTT-negative and B. burgdorferi PCR-negative, were also TCRnegative, calling into question the diagnosis of Lyme disease in these individuals and suggesting that T-cell-based testing may aid in the differential diagnosis of Lyme disease and similar tickborne illnesses with overlapping manifestations, such as STARI (48). Further studies are needed to understand the utility of TCR testing in patients with later stages of Lyme disease and long-term sequelae. Approximately 10% to 20% of patients treated for Lyme disease experience long-term symptoms lasting ≥6 months after treatment, known as posttreatment Lyme disease syndrome (PTLDS) (4, 54). Elevated IL-23 and CCL19 levels in individuals with PTLDS compared to those with symptom resolution suggest a role for . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2021. proposed as a potential mechanism underlying late manifestations of Lyme disease (38), characterization of immunodominant epitopes during early infection may also provide insight into late sequelae and persistent symptoms. By combining MIRA and sequence-based analysis, we were able to identify antigens recognized by a subset of public Lyme-associated TCRs identified in this study. Future studies comparing whether enhanced TCR sequences associated with specific epitopes are preferentially associated with early versus late Lyme disease may provide insights into the pathophysiology of disease, as well as a means to predict which patients are most likely to develop late manifestations. Application of the present TCR classifier as a diagnostic assay (T-Detect™ Lyme) will be further evaluated in order to support its clinical utility relative to 2-tiered serologic testing. While the present analysis is limited to samples previously collected from well-defined prospective cohorts of clinically confirmed and/or laboratory-confirmed early Lyme disease, additional prospective clinical validation studies are needed to further characterize the advantages of TCR testing relative to serology in scenarios where the spectrum of presenting illness may vary. In . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2021. ; https://doi.org/10.1101/2021.07.30.21261353 doi: medRxiv preprint addition, evaluation of potential assay cross-reactivity against other pathogens is also needed, though antigen mapping indicates that a subset of the identified TCRs are highly-specific for known B. burgdorferi proteins. Results of this study demonstrate that TCR testing can have high clinical utility as a sensitive and specific diagnostic for Lyme disease. Diagnostic validation of our TCR classifier indicates that analysis of the T-cell response may facilitate diagnosis of early Lyme disease prior to detection of the humoral response, allowing for earlier recognition of disease and initiation of antimicrobial treatment to prevent the development of more severe illness in patients who lack definitive clinical signs/symptoms. Although this is the first study to evaluate the utility of TCR immunosequencing to identify acute bacterial infection, the observed diagnostic performance is consistent with that of similar TCR-based classifiers for identification of past CMV or SARS-CoV-2 infection (43, 44, 46, 47). These studies all leveraged a standardized approach to immunosequencing of the TCR repertoire from blood samples, followed by application of an algorithm designed to yield clear positivity thresholds for identification of disease cases. In addition, because the algorithms are based on statistical association of TCRs with disease, diagnostic sensitivity is expected to improve as the size of training data increases, while collection of large numbers of case/control samples across a range of environmental contexts will allow for detailed characterization and further generalization of the classifiers. Collectively, these studies indicate that characterization of the adaptive immune response through sequence-based identification of public, disease-specific TCRs is a powerful and generalizable approach to aid in diagnosing disease. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2021. in the JHU cohort, stratified by symptom duration (days) at time of enrollment. Participants were stratified based on self-reported symptom duration. Error bars represent mean ± 95% CI . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2021. ; https://doi.org/10.1101/2021.07.30.21261353 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2021. Figure S1 : Immunosequencing input DNA distributions by cohort. Boxes indicate median ± interquartile ranges (IQR), and whiskers denote 1.5 times the IQR above the high quartile and below the low quartile. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2021. The repertoires used in this study were sampled from 8,590 donors enrolled in multiple studies, described in detail below. As the aim of this study was to develop and validate a model for identifying early Lyme disease with high sensitivity and specificity, allocation of samples to sets used for training, setting the classification threshold, and validation was prespecified (see Table S1 for detailed cohort information). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2021. phagocytophilum was performed at ARUP Laboratories. For the present study, immunosequencing was performed on the first available sample from 18 donors who were seropositive for B. burgdorferi by enzyme-linked immunosorbent assay (ELISA) and immunoblot (either IgG or IgM), all of whom were classified as STTT-positive by 2-tiered testing criteria. Study of Lyme disease Immunology and Clinical Events (SLICE) was a longitudinal, prospective cohort study that enrolled adult patients (≥18 years of age) with early Lyme disease who were . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2021. ; https://doi.org/10.1101/2021.07.30.21261353 doi: medRxiv preprint self-referred or recruited from primary or urgent care settings from 2008 to 2020. Eligible participants were primarily enrolled at study sites in Maryland, with a small number enrolled at a satellite site in southeastern Pennsylvania. At enrollment, participants were required to have a visible EM ≥5 cm in diameter diagnosed by a health care provider and either multiple skin lesions or at least one new-onset concurrent symptom. All patients had received ≤72 hours of appropriate antibiotic treatment for early Lyme disease at enrollment. Additional details and baseline clinical characteristics of this sample have been previously published (58). Participants without a clinical or serologic history of Lyme disease were recruited from similar primary care settings or through the community using flyers and online advertising to serve as endemic controls. This cohort was required to be STTT-negative at the time of enrollment and at all subsequent visits, as well as be free of any history of prior clinical Lyme disease. All participants in both groups were excluded for a range of self-reported prior medical conditions paralleling those listed in the proposed case definition for posttreatment Lyme disease syndrome (PTLDS) (59), specifically chronic fatigue syndrome, fibromyalgia, unexplained chronic pain, sleep apnea or narcolepsy, autoimmune disease, chronic neurologic disease, liver disease, hepatitis, HIV, cancer or malignancy in the past 2 years, major psychiatric illness, or drug or alcohol abuse. All patients were treated with 3 weeks of oral doxycycline in accordance with IDSA Guidelines (16). Lyme patients were seen regularly over the course of 2 years for a total of 5 study visits (before and immediately after treatment, and 1 month, 6 months, and 2 years posttreatment). Samples collected before and immediately after treatment and 6 months posttreatment were used for the present study. Control samples from healthy individuals were collected at an initial visit and 6 months and 1 year later; samples from the initial visit were used in the present . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2021. ; https://doi.org/10.1101/2021.07.30.21261353 doi: medRxiv preprint study. Disseminated EM rash was defined as having more than one visible rash site, while local rash was defined as a single EM rash site. High-resolution HLA class I and class II typing for the JHU cohort (cases only) was performed by Scisco Genetics, Inc., (Seattle, WA, USA) using the ScisGo HLA v6 typing kit, as previously described (60, 61). A total of 7,964 repertoires that were sampled as part of previous studies were selected from our database. Inclusion was determined at the cohort level and based on the size of the cohort, geographic region (US and Lyme-endemic regions of Europe), and sequencing date (2019 or later, to ensure consistent lab sequencing protocols). These repertoires were from individuals defined as being either from endemic regions (cohorts from Germany, Italy, or upper Midwest or Northeast regions of US) or non-endemic regions (other regions of the US). All individuals in these cohorts were presumed to be Lyme negative but were not tested for Lyme disease. All training cases were drawn from the LDB and Boca Biolistics cohorts. To enrich for cases with a likely immune response in order to maximize our ability to detect Lyme disease-associated enhanced sequences, the training set was limited to 72 STTT-positive cases (54 from LDB, 18 from Boca Biolistics). Training controls included 2,981 repertoires from individuals from non-. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2021. ; https://doi.org/10.1101/2021.07.30.21261353 doi: medRxiv preprint endemic regions of the US and Europe that were previously collected as a part of other studies and presumed to have never been exposed to B. burgdorferi infection. The positive-call threshold was set based on 2,507 presumed Lyme-negative samples collected from endemic regions that were available in our database, along with 120 confirmed STTTnegative endemic controls randomly selected from the LDB cohort. Additional LDB case (n = 15) and control (n = 48) samples collected during the 2019 tick season were sequenced after model training and used as an initial check of model specificity and generalizability. The primary endpoint of the study was evaluation of sensitivity in the JHU cohort, which was selected based on the conservative enrollment criteria for that cohort. Repertoires sampled from 211 participants at time of enrollment passed quality control (QC) thresholds established after model training described below. A subset of patients in the JHU cohort (n = 161) had sequenced repertoires that passed QC from samples collected before and after treatment and 6 months posttreatment. Specificity of the final model was estimated based on 3 endemic control cohorts: 1) all endemic controls from JHU (n = 45); 2) 50% of endemic controls from tick seasons prior to 2019 in the LDB cohort (selected by random sampling; n = 115 passed QC); and 3) 50% of presumed endemic controls from our database (selected by random sampling; n = 2,471 passed QC). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2021. ; Immunosequencing of complementarity-determining region 3 (CDR3) of human TCRβ chains was performed using the immunoSEQ® Assay (Adaptive Biotechnologies, Seattle, WA). Extracted genomic DNA was amplified in a bias-controlled multiplex PCR, followed by highthroughput sequencing. Sequences were collapsed and filtered in order to identify and quantitate the absolute abundance of each unique TCRβ CDR3 region for further analysis as Public TCR amino acid sequences associated with early Lyme disease were identified as described previously (43). Briefly, one-tailed Fisher's exact tests (FETs) were performed on all unique TCR sequences to compare frequencies in early Lyme samples with those in presumednegative controls. Unique sequences were defined based on the V gene, J gene, and CDR3 amino acid sequence. The P-value threshold for including a TCR in the enhanced sequence list was treated as a hyperparameter and was selected to maximize model performance as . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2021. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2021. ; https://doi.org/10.1101/2021.07.30.21261353 doi: medRxiv preprint average and maximum productive frequency for cases and controls; and 3) the number of sequences in that are similar to the TCR, as defined above. In practice, a larger CMV-labeled cohort was unavailable. However, as >50% of North American . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2021. ; https://doi.org/10.1101/2021.07.30.21261353 doi: medRxiv preprint Given a set of enhanced sequences , the pair ( , ) can then be defined for each repertoire , where is the total number of unique productive DNA TCR rearrangements in the sampled repertoire, and < is the number of those rearrangements that encode any of the enhanced sequences in . If is treated as sampled from a random variable , the expected value of given can be considered. By the way enhanced sequences are defined, the distribution of | is expected to vary substantially between cases and controls. While this could be treated as a classification problem to maximize the separation between cases and controls (as in [43, 44] for model parameters and . For a given ( , ), the number of standard deviations is from the expected mean given is then used as the model score: . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2021. ; https://doi.org/10.1101/2021.07.30.21261353 doi: medRxiv preprint The model parameters and were chosen by minimizing the sum of squared residuals over the set of training control samples. The observed data are moderately overdispersed with respect to the estimated variance ( Fig 1A) . As such, the final call threshold was chosen to fix the prespecified false-positive rate of 1% on a set of 2,627 presumed Lyme-negative control samples, as described above. The two key parameters of the classifier are the number of unique productive rearrangements, , and the number of unique productive rearrangements encoding an enhanced sequence, . For a given blood sample, the value of is determined by the quantity of DNA, the fraction of cells that are T-cells, and the diversity of T-cells. In rare cases, is too small to yield meaningful information, or significantly larger than observed in our training data, making extrapolation of | problematic. Therefore, acceptance criteria were predefined for the number of unique productive rearrangements based on the observed distribution of in the training data. The information contained in enhanced sequences is asymmetric: for small , large is considered to be evidence of Lyme disease, while small may simply reflect a lack of sequenced T cells. Thus, QC criteria were treated asymmetrically. Specifically, max and min were defined as the upper and lower QC thresholds, which were prespecified to be equal to the 1 st and 99 th percentiles, respectively, of observed in the training data. A sample then failed QC if > max , or if < min and ( , ) < . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2021. ; https://doi.org/10.1101/2021.07.30.21261353 doi: medRxiv preprint The multiplex identification of antigen-specific T-cell receptors (MIRA) assay was set up, performed, and analyzed as described previously (67) The peptides were pooled in a combinatorial fashion as described previously (67); peptides that were overlapping or in close proximity in the viral proteome were grouped together into antigen sets. Each antigen set was then placed in a subset of 5 unique pools out of 11 total pools in the Lyme-MIRA1 panel, or 6 pools out of 12 in the Lyme-MIRA2 panel, referred to as its occupancy. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 2, 2021. ; https://doi.org/10.1101/2021.07.30.21261353 doi: medRxiv preprint RNA was isolated using AllPrep DNA/RNA mini and/or micro kits, according to the manufacturer's instructions (Qiagen). RNA was then reverse transcribed to cDNA using Vilo kits (Life Technologies, Carlsbad, CA, USA), and TCRβ amplification was performed using the immunoSEQ Assay described above. After immunosequencing, the behavior of T-cell clonotypes was examined by tracking read counts across each sorted pool. True antigen-specific clones should be specifically enriched in a unique occupancy pattern corresponding to the presence of one of the query antigens in 5 or 6 pools in the Lyme-MIRA1 and Lyme-MIRA2 panels. Methods used to assign antigen specificity to TCR clonotypes have been reported previously (67). In addition to these methods, a nonparametric Bayesian model was developed to compute the posterior probability that a given clonotype was antigen specific. This model uses the available read counts of TCRs to estimate a mean-variance relationship within a given experiment, as well as the probability that a clone will have zero read counts due to incomplete sampling of low frequency clones. Together, this model considers the observed read counts of a clonotype across all pools and estimates the posterior probability of a clone responding to all valid addresses and an additional hypothesis that a clone is activated in all pools (truly activated, but not specific to any of our query antigens). To define antigen-specific clones, we identified TCR clonotypes assigned to a query antigen from this model with a posterior probability ≥0.7. TCR sequences from MIRA were compared to the enhanced sequence list on the basis of Vgene, J-gene, and CDR3 amino acid sequences. Any exact matches between the two lists, where . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 2, 2021. ; the MIRA TCR sequence was found in at least 2 separate individuals, were considered sufficient to map the enhanced sequence to the MIRA antigen. Clustering of enhanced sequences was based on TCR amino acid similarity. Specifically, two TCRs were assigned to the same cluster if they shared V-gene family (and so have similar complementary determining regions [CDRs] 1 and 2), had identical length, and differed by at most 1 amino acid in the CDR3 region. Clusters with at least 5 enhanced sequences were reported ( Table 2) . A sequence motif representing the CDR3 amino acid sequences assigned to each cluster was generated using WebLogo (68, 69). To assign an enhanced sequence to a single HLA subtype, a 1-tailed FET was performed between that enhanced sequence and every HLA subtype. The enhanced sequence was assigned to the HLA subtype with the lowest P-value; if the lowest P-value was >0.001, no assignment was made. Contingency tables counted the number of individuals with/without the enhanced sequence and with/without a given HLA subtype. For HLA-DQ and HLA-DP, α/β heterodimers were treated as distinct HLA subtypes; for example, individuals with 2 α subtypes and 2 β subtypes were treated as expressing all 4 possible heterodimers. An HLA subtype was assigned to an enhanced sequence cluster if a majority (>50%) of the cluster members with an assigned HLA subtype were assigned to the same subtype. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 2, 2021. ; https://doi.org/10.1101/2021.07.30.21261353 doi: medRxiv preprint Lyme Disease. 2021 Use of commercial claims data for evaluating trends in Lyme disease diagnoses Estimating the frequency of Lyme disease diagnoses