key: cord-0026510-4ma0qj90 authors: Thakur, Chandrani; Tripathi, Ashutosh; Ravichandran, Sathyabaarathi; Shivananjaiah, Akshatha; Chakraborty, Anushree; Varadappa, Sreekala; Chikkavenkatappa, Nagaraj; Nagarajan, Deepesh; Lakshminarasimhaiah, Sharada; Singh, Amit; Chandra, Nagasuma title: A new blood-based RNA signature (R(9)), for monitoring effectiveness of tuberculosis treatment in a South Indian longitudinal cohort date: 2022-01-10 journal: iScience DOI: 10.1016/j.isci.2022.103745 sha: 32fb1893c9591e4999dfa487def1ebb9b565e44a doc_id: 26510 cord_uid: 4ma0qj90 Tuberculosis (TB) treatment involves a multidrug regimen for six months, and until two months, it is unclear if treatment is effective. This delay can lead to the evolution of drug resistance, lung damage, disease spread, and transmission. We identify a blood-based 9-gene signature using a computational pipeline that constructs and interrogates a genome-wide transcriptome-integrated protein-interaction network. The identified signature is able to determine treatment response at week 1–2 in three independent public datasets. Signature-based R(9)-score correctly detected treatment response at individual timepoints (204 samples) from a newly developed South Indian longitudinal cohort involving 32 patients with pulmonary TB. These results are consistent with conventional clinical metrics and can discriminate good from poor treatment responders at week 2 (AUC 0.93(0.81–1.00)). In this work, we provide proof of concept that the R(9)-score can determine treatment effectiveness, making a case for designing a larger clinical study. Identified blood-based 9gene signature for TB treatment prognosis Validated the signature in a newly developed longitudinal cohort Devised a treatment response score (R 9 -score) based on the identified 9gene panel R 9 -score classified treatment responders from multiple cohorts INTRODUCTION Tuberculosis (TB), caused by a deadly pathogen Mycobacterium tuberculosis, has retained the status of being the largest killer among all infectious diseases. According to WHO, about 10 million TB cases are reported annually globally, and about 1.45 million TB-related deaths were reported in 2018 alone (WHO, 2019a) . India, given its large population, carries one of the highest burdens of TB and contains a quarter of the world's cases with 2.69 million cases and about 449,700 deaths (WHO, 2019a) . The standard treatment recommended by WHO for treating uncomplicated TB is a four-drug combination regimen, isoniazid (INH), rifampicin (RIF), pyrazinamide (PZA), and ethambutol (ETH), typically given for two months, referred to as the intensive phase, followed by INH and RIF for four months, termed the continuation phase (generally referred to as Category-I treatment). WHO recommended a directly observed treatment, short-course (DOTS) control strategy to ensure that patients have access to medications through the treatment period and enhance patient adherence, which is widely implemented in high burden areas (Karumbi and Garner, 2015; Central TB Division et al., 2012 , 2017 . With this, the treatment success rate in uncomplicated cases has increased significantly (Central TB Division et al., 2017; Kurz et al., 2016) . However, the ground truth is that thousands of cases of drug failure due to drug resistance or other complications are continuing (WHO, 2019a) . The problem of the emergence and spread of multidrug-resistant (MDR) and extremely drug-resistant (XDR) TB is alarming, leading to a large number of cases with treatment failure, clearly reflected in a large number of deaths (Dheda et al., 2017) . Second-line therapies are available for treating such cases, which are given for more extended periods, ranging from 9 to 20 months depending upon the treatment regimen (Lange et al., 2019; WHO, 2019b; Migliori et al., 2017) . In some cases, MDR TB is detected with GeneXpert, whereby one of the second-line therapies is started (Theron et al., 2014; Dorman et al., 2018; Di Tanna et al., 2019) . However, in all other cases, where no clear information is available, treatment is typically started with Category-I and evaluated periodically and switched to different treatment categories if found necessary after 2 to 4 months of initiating therapy. To determine the effectiveness of therapy, clinicians currently rely on overall clinical presentation, sputum conversion, and periodic chest X-rays, where an improvement in clinical scores is typically taken as effectiveness (Chakraborthy et al., 2018; Central TB Division et al., 2016 , 2020 . However, these tests have several limitations. Sputum smear microscopy has low sensitivity (34%-80%), is operator-specific, and cannot be used to differentiate between the tubercle and non-tuberculosis mycobacteria reliably or between live and dead bacilli (Davies and Pai, 2008; Steingart et al., 2006) . Typically, a 2-month sputum culture conversion is widely used to check treatment effectiveness and is associated with a relapse-free cure (Johnson et al., 2009; British Thoracic Association, 1981) . However, this test takes a long time, is prone to contamination, has poor sensitivity and specificity for predicting treatment failure, and relapse in individual patients (Phillips et al., 2013 (Phillips et al., , 2016 Gillespie et al., 2014; Merle et al., 2014; Shenai et al., 2016) . A delay in initiating the appropriate therapy has the risk of promoting the evolution of resistance, extensive lung damage, the spread of disease to other organs, and transmission of the causative bacilli to other people (Virenfeldt et al., 2014; Cox et al., 2008; Yang et al., 2017) . A non-sputum-based biomarker that is capable of detecting response will therefore be very valuable (Walzl et al., 2011) . A lack of improvement upon therapy indicates either a drug-resistant infection or the development of some complications in the patient (Nahid et al., 2019; Law et al., 2017) . Identification of cases with ineffective treatment is necessary to make the required clinical decisions such as further investigation and treating complications, lengthening the TB treatment duration, switching to second-line or alternate treatment regimens, isolating the patient, and monitoring the patient more closely, which are required to reduce morbidity, limit disease transmission, and prevent drug resistance (Holden et al., 2019; Bradford et al., 1996; Cegielski et al., 2014) . Host molecular markers are being increasingly explored, as knowledge of the transcriptome variations in active disease and upon antitubercular treatment is increasingly being accumulated (Cliff et al., 2015; Bloom et al., 2013; Berry et al., 2010; Sambarey et al., 2017a) . As blood is an easily accessible tissue, it would be advantageous to have a sensitive blood test that can detect response to TB treatment and track its progress. Blood transcriptomes of TB subjects, before and during treatment, have been reported for multiple cohorts (Cliff et al., 2013; Bloom et al., 2012; Tientcheu et al., 2015; Ottenhoff et al., 2012) , which have shown that there is symmetry in the gene expression variation pattern in disease and its resolution upon treatment. Recently, two gene panels evaluating the success of treatment have also been reported (Warsinske et al., 2018; Thompson et al., 2017) . Of these, the 3-gene signature, which consists of GBP5, DUSP3, and KLF2 genes, was found to be characteristic of active disease and was subsequently found to correlate with treatment response as well (Sweeney et al., 2016; Warsinske et al., 2018) . The RESPONSE5 signature consisting of four coding genes (SMARCD3, UCP2, MAP7D3, and STT3A) and one non-coding gene (RP11-295G20.2) was discovered by an analysis of gene expression data and selection of feature-pairs that best correlated with treatment response (Thompson et al., 2017) . The signatures 0 performance in differentiating treated (TR) versus active disease/treatment naive week-0 TB subjects (TB 0 ) suggests the promise of finding a clinically useful biomarker. Given that the transcriptomes are large datasets and exhibit complex trends of variation upon treatment in different individuals, it is essential to rigorously interrogate the data in diverse populations with multiple genetic and geographical backgrounds, especially where the disease is endemic. Earlier studies on treatment biomarkers were done in countries other than India (Sweeney et al., 2016; Warsinske et al., 2018; Thompson et al., 2017) . However, it is imperative to do these studies in other high burden countries such as India and compare the outcome with other studies to identify a robust set of biomarkers that is independent of geographical location and population genetics. In this work, we seek to identify a non-sputum-based biomarker to determine the effectiveness of therapy at six months of treatment in a South Indian population and further evaluate if the effect can be identified at a much earlier time point. Using blood transcriptomes and a computational pipeline involving protein interaction networks, we identify a subnetwork that captures the molecular perturbations associated with response to treatment and identifies a 9-gene RNA signature. We show that it performs well in two public cohorts and in another new cohort from a South Indian hospital that we followed for a year. Our signature shows agreement with clinical scores and can detect response as early as 1-2 weeks of initiation of treatment. We formulated a score (R 9 -score) to capture the combined effect of the signature, which blindly detected cases of treatment complications and showed that they correlated with clinical findings of successful treatment. Our cohort also included a few cases of poor response to TB, which our score was able to detect well. Whole blood transcriptome data of four different cohorts of subjects with TB that were publicly available are used in this study (Table 1 the course of treatment, and a new Healthy cohort from the same geographic location was also studied ( Figure 1 and Tables 1 and S1A). For discovery, GSE89403 dataset (GSE89403- Thompson) , containing RNA-Seq transcriptomes of 99 pulmonary TB subjects at week 0 and followed up over the course of standard TB treatment at week 1, week 4, and week 24 (only week 0 and week 24 samples from ''definite cured'' individuals were used for biomarker discovery) as well as GSE122485 (GSE122485-Sambarey), a whole blood RNA-Seq transcriptome dataset from South Indian pulmonary tuberculosis subjects containing data from four treatment naive week 0 subjects, three age-matched healthy controls, and three subjects after standard TB treatment at month 6 ( Thompson et al., 2017; Sambarey et al., 2017b) , were used. We used only week 0 and month 6 data for discovery and all the other data of in-between time points (GSE89403-Thompson) for validation. Two other public datasets belonging to different cohorts, GSE31348 (GSE31348-Cliff) and GSE40553 (GSE40553-Bloom) (Cliff et al., 2013; Bloom et al., 2012) were used for validating the signature ( Table 1 ). The GSE31348-Cliff is an mRNA microarray dataset for a longitudinal cohort of 27 subjects with pulmonary TB from South Africa who were followed up over the course of standard TB treatment for six months. It contains data for five time points, at week 0, week 1, week 2, week 4, and week 26 over the treatment course. GSE40553-Bloom contains data for pulmonary TB subjectseight from UK and 29 from South Africa at week 0 and followed up over standard TB treatment at week 2, month 2, month 6, and month 12. In addition, to test the validity of the signature on the Indian population, we built a South Indian TB cohort (as described in the STAR Methods), which we term as the Bangalore longitudinal TB cohort (BLTB), comprising 32 patients who were followed up for six months to a year. Blood samples of patients from BLTB cohort were used for testing the validity of the signature and retrospective comparison with their clinical records ( Figure 1A ). Whole blood samples of 22 individuals that were Interferon-gamma release assay (IGRA) negative, HIV negative, and had normal chest X-rays were used as healthy controls ( Figure 1B ). An unbiased multi-step screen identifies a 9-gene signature that is characteristic of response to treatment We configured a computational pipeline to shortlist genes reflective of the progress of TB treatment. The pipeline is an unbiased screen that starts with all the known coding genes in the whole genome captured by transcriptomics and progressively shortlists genes at each of its steps based on different criteria ( Figure 2 ). A key step in the pipeline is network analysis that performs an unbiased screen to identify genes associated , with normal chest X-ray and hemogram profile. *Demographic profile for subjects enrolled in study as BLTB and Healthy cohort is mentioned in Tables 1,S1C, and S1D. (Sambarey et al., 2017b; Metri et al., 2017; Ravichandran et al., 2021) . Briefly, our method uses a knowledge-based comprehensive human protein-protein interaction network (hPPiN) previously constructed by us (Table S2A) , renders it specific to each given condition by integrating gene expression data into it, and sensitively mines most perturbed subnetworks and their most influential epicentric nodes (Sambaturu et al., 2016 (Sambaturu et al., , 2021 Ravichandran and Chandra, 2019) . The network analysis carried out for the discovery datasets for TB 0 (week 0) samples versus TR (month 6) yielded subnetworks of size 2,457 nodes, 4,459 edges for GSE89403-Thompson, and 2,710 nodes, 4,649 edges for GSE122485-Sambarey, and shared 1,454 common genes between them (Table S2B) . To understand if this common subnetwork contained genes known to be biologically relevant to the condition being studied, we carried out a functional enrichment analysis. We observed that the subnetwork was significantly enriched (q-value < 0.05) in functional categories that belong to IFN-g signaling, Toll-like receptor (TLR) signaling, NF-kB signaling, MAPK signaling, PI3/AKT signaling, TNF-a signaling, and JAK-STAT signaling pathways, all involved in the innate immune response to TB (Figure 3 ). In fact, most genes present Figure 2 . The biomarker discovery pipeline for blood-based TB treatment prognosis markers The various filtration steps of the pipeline (blue) along with the number of genes selected at each step (black) are represented. Datasets used for the study are colored in red. Two public datasets, GSE89403-Thompson and GSE122485-Sambarey, were used for biomarker discovery. For validation and performance evaluation of the identified biomarkers, two independent public datasets GSE40553-Bloom and GSE31348-Cliff along with newly developed BLTB cohort were used. TB: active-TB/treatment-naive TB, TR(6M): 6-month TB treatment, HC: Healthy control *Demographic profile for subjects enrolled in study as BLTB and Healthy cohort is mentioned in Tables 1,S1C, and S1D. iScience 25, 103745, February 18, 2022 iScience Article in these subnetworks are known to be directly or indirectly associated with the pathobiology of TB. For example, FCGR1A and BATF2 are involved in pro-inflammatory responses (Roy et al., 2015; Sutherland et al., 2013; Roe et al., 2016) , SOCS3 is involved in an anti-inflammatory pathway (Zanin-Zhorov et al., 2005; Mistry et al., 2007) and is known to have a role in controlling the Mtb infection along with STAT3 (Rottenberg and Carow, 2014), CD274 is involved in Treg expansion (Trinath et al., 2012) , MMP9 and ANXA3 have a role in granuloma formation (Ramakrishnan, 2012; Riou et al., 2012; Park et al., 2005) , and GBP1 is reported to solicit host defense proteins including the phagocyte oxidase, antimicrobial peptides, and autophagy effector to kill intracellular bacteria (Kim et al., 2011) . This indicates that our network analysis has correctly identified known components of the host response to TB infection. The network analysis resulted in a shortlist of 1,454 genes. The next steps in the pipeline apply various filters and eliminate genes that do not satisfy each step's criteria. Briefly, the filters pertain to (a) retaining only those genes that were present as a functional module and clustered together (1,042 genes were retained), (b) retaining only the DEGs from the subnetworks (FC R 2 and q-value < 0.05), and (c) retaining only those that show symmetric variation in TB 0 , TR, and HC ( Figure 2 and Table S2C ). Healthy samples (HC) were studied to restrict only those genes that exhibit disease-associated changes. Symmetry in gene expression variations (upregulation in disease and downregulation upon treatment) is known to exist in TB and hence the inclusion of this as a filter. Adding this criterion adds further specificity to the shortlisted genes as it eliminates all those that vary merely due to the drug (Cliff et al., 2013) . This resulted in a shortlist of nine genes (BCL6, FCGR1A, GBP1, SERPING1, BATF2, AIM2, SMARCD3, ANXA3, and SOCS3) as a discriminatory panel between TB 0 and TR samples (Table S2C) . We found that all the nine genes showed a reversal in gene expression within week 1 of treatment, and the trend was more pronounced at week 24 for the same patients in the GSE89403-Thompson dataset ( Figure S1 ). A similar trend was observed for GSE31348-Cliff and GSE40553-Bloom datasets (Figures S2 and S3) . In order to test the dependency of the discovered markers on the choice of the discovery datasets, we used each available dataset individually for computing the response networks (one of the first steps in our discovery Thompson and GSE122485-Sambarey) . Genes belonging to same functional category are clustered into uniquely color-coded modules. Significantly enriched (q-value < 0.05) pathway information from the Reactome database for these modules is also indicated. The network is provided in Table S2B .*Steps required to carry out network analysis along with the scripts are detailed in Data S1, S2, and S3. iScience Article pipeline) and performed an overlap analysis for the nine signature genes ( Figure S4 ). We observed common KEGG pathway terms to be significantly enriched for all the response networks ( Figure S4D ). We also observed eight of the 9-gene signature (all except SMARCD3) overlap among all datasets ( Figures S4A-S4C ), depicting the robustness of the network approach used to shortlist genes as biomarkers. Next, we evaluated the performance of the 9-gene panel in classifying TB 0 and TR samples, in the discovery and the validation datasets, by using Logistic Regression analysis. The 9-gene panel was found to correctly classify TB 0 and TR samples at six months with AUC (Area Under Curve) values R 0.92 in the discovery cohort (GSE89403-Thompson: 0.95 (95% CI 0.93-0.97)) as well as in the validation cohorts (GSE40553-Bloom: 0.99 (95% CI 0.99-0.99), GSE31348-Cliff: 0.92 (95% CI 0.85-0.99)), as judged by 5-fold cross-validation (Figures 4A and 4B and Table S3A ). We then tested how well our panel discriminates TB 0 from TR at earlier time points (only data at the 6-month time point from GSE89403-Thompson was used for discovery). We find that at week 1-2, the panel was capable of classifying the samples with AUC of 0.79 (95% CI 0.76-0.81), 0.78 (95% CI 0.64-0.91), and 0.91 (95% CI 0.83-0.98) for GSE89403-Thompson, GSE40553-Bloom, and GSE31348-Cliff, respectively. Overall, the AUC value increased over the course of the treatment (Figures 4A and 4B) . The performance of the 9-gene panel in the discovery and the validation cohorts clearly indicate its potential to serve as a biomarker signature to detect response to TB treatment. The R 9 -signature is consistent with clinical correlates of response to treatment in the BLTB cohort To test the clinical significance of the 9-gene panel in the South Indian population, we developed a longitudinal cohort BLTB as described in the STAR Methods. We tested the expression levels of the nine genes in each subject at weeks 0 and 2, months 1, 2, 3, 4, 5, and 6 of TB treatment and in some subjects also at week 3, and months 8, 10, and 12. Overall, the dataset consisted of 204 samples from 32 subjects that spanned untreated and post-treatment samples at different time points (Tables S1A and S1C). For each patient, six different clinical scores, Chest X-ray: Timika score (Chakraborthy et al., 2018) , ESR level, TB scores I and II (Rudolf, 2014; Rudolf et al., 2013) , Karnofsky performance score (Schag et al., 1984; Pé us et al., 2013) , and sputum AFB smear test were recorded along with a clinical examination (Table S4) . Among the clinical scores, the TB score I and II were used to gauge disease severity at diagnosis (week 0), whereas the Timika score based on the Chest X-ray, the ESR, and sputum smear conversion were used for monitoring treatment outcome through the treatment period. All subjects were sputum positive at week 0. The Karnofsky performance score did not vary much in most subjects, and because it is highly subjective, it was not considered for further analysis. Based on the clinical scores and independent evaluation of the patients 0 overall well-being by the clinical team (a panel of physicians and pulmonologists who routinely treat TB cases in a tertiary care hospital), the subjects were grouped into two broad categories representing cases of (i) good response to treatment (good responders) and (ii) lack of adequate response or cases of complications (intermediate/ poor responders). Twenty-two subjects were seen to be in the category of good responders with an uneventful response to the standard DOTS therapy ( Figure 1A ). Ten subjects appeared to be intermediate/poor iScience Article responders (referred to as poor responders hereafter) at more than one time point (as the chest X-ray scores and ESR were both on the rise) and developed complications such as pneumothorax or hemoptysis or had other co-morbidities and showed symptoms of increased infection burden at intermediate time points. Some of them exhibited good response at the month-6 time point (Tables 2 and S4 ). The whole blood gene expression of the identified 9-gene signature was tested using qRT-PCR (Gene primers used for qRT-PCR analysis are listed in Table S5 ). All genes showed the expected trend in gene expression variation ( Figure S5 ). The signature genes were able to discriminate TB 0 from TR samples at month 6 with AUC 0.98 (95% CI 0.94-1.00) in good responders ( Figure S6 ). We capture the combined effect of the 9gene panel by computing a geometric mean of the fold changes of the individual genes (Equation 9, R 9 -score) and tested it on samples from each patient at different time points of treatment. The R 9 -score values were significantly higher in the good responders compared to poor responders, clearly demonstrating the validity of the 9-gene signature in the BLTB cohort ( Figure 5A ). Figure 5B shows a heatmap of the R 9 -scores for 25 subjects (BLTB cohort) for whom data were available for four or more time points. In addition to the 25 subjects shown in Figure 5B , samples for seven more subjects at three to four early time points were available. The subjects were followed up for six months and responded to treatment, but blood samples were not available. We compared the R 9 -scores with the clinical scores and the observed treatment outcomes (listed in Tables S4 and in terms of sputum and chest X-ray R 9 2W is the R 9 -score value for week-2, R 9 6M is the R 9 -score value for month-6. If data point is not present for the week-2 or month-6, then the nearest available data point is reported. W: Week, M: Month. iScience 25, 103745, February 18, 2022 9 iScience Article 2) to test the performance of the R 9 -score. The ESR and Timika scores, and where available, the sputum conversion, were used for checking agreement with our R 9 -scores at each time point (Figures 5B and 5C ). We considered R 9 -scores to be in agreement with clinical scores if both the scores qualitatively followed the same trend, such as an increase in the R 9 -scores and the reduction in Timika scores and ESR values. For the 22 good responders, we observed there is 93.55% (95% CI 0.89-0.99) agreement between the R 9 -score and the clinical evaluation ( Figure 5D ). Similarly, for 10 cases of poor response/complications, we observed 83.63% (95% CI 0.74-0.93) agreement. Among the poor responders predicted by the R 9 -score and also by the clinical parameters, one patient (P_3) tested positive for MDR TB using GeneXpert MTB/RIF assay at month 4, consistent with the R 9 -score of 0.26 for month 3 and 1.15 for month 4. This patient was switched to Category-II treatment due to refusal of Category-IV treatment. Subsequently, at month 12, we observed an R 9 -score of 0.48, indicative of poor response. The hospital records indeed indicate that P_3 did not respond to the Category-II treatment either and was eventually switched to Category-IV. The other poor responders had delayed response clinically and radiologically. P_30, P_17, and P_28 had a high bacterial burden (sputum smear Figure 5 . R 9 -scores for treatment response groups of subjects with active TB in the BLTB cohort (A) A boxplot representing a significantly higher R 9 -scores (t-test p < 10 À5 ) in good responders compared to poor responders across all the time points of patients in BLTB cohort. The horizontal line in the middle of the box shows the medians; box limits indicate the 25th and 75th percentiles; whiskers extend 1.5 times the IQR from the 25th and 75th percentiles. The x axis represents the response category for patients in BLTB cohort, and the y axis indicates the R 9score. The black dots represent R 9 -score for individual subjects at an individual time point. The median R 9 -score value (red dot) is connected by a red line to indicate the score trend between patient groups. A zoomed-in portion for the R 9 -score comparison between the patient category is shown (left panel). (B) A heatmap representing the R 9 -score for individual subjects (columns) at each time point (rows).R 9 -score >1.5 is green, 1 G0.5 is yellow and < 0.5 is red. Gray cells indicate missing data points. An agreement (qualitative match) between R 9 -score and clinical evaluation is represented by a tick and disagreement is represented by a cross. (C) The R 9 -score matrix for additional seven good responders with data for less than four treatment time points (see Figure 1A ). (D) A stacked bar plot representing the agreement between the R 9 -score and the clinical data for each treatment response category in the BLTB cohort. The x axis represents the patient response group, and the y axis represents all available R 9 -score data points for that response group's subjects. A qualitative match (agreement) between clinical evaluation and R 9 -score trend is colored in cyan and the total percentage of match is represented. A mismatch (disagreement) in the clinical evaluation and R 9 -score trend is colored in red and the total percentage is represented above the bar. The 95% CI (CI) value is also indicated for each patient category. (E) Receiver operating characteristics (ROC) curves for distinguishing good and poor responders at Week 2 of TB treatment therapy using R 9 -score. The area under the curve (AUC) with 95% CI value is indicated for the ROC curve. The optimal threshold for the curve is 0.64 where the TPR is high and FPR is low. *qRT-PCR data for each patient with calculated RQ and RCN values are provided in Data S4. iScience Article test 3+ at week 0: before TB treatment), leading to delayed sputum conversion and pneumothorax development. Treatment was extended from 6 to 9 months for P_30 due to pneumothorax complications. In addition, subjects with complications (e.g., pneumothorax (P_16) and hemoptysis (P_32)) did not have a smooth progression of treatment. Sputum conversion was observed late in the treatment (around the fifth month for P_21 and P_16, and fourth month for P_24), which is captured by our score. Table 2 describes the case histories for all the subjects. We then tested the performance of R 9 -score in discriminating between good and poor responders at earlier time points. R 9 -score was able to discriminate the two categories with AUC 0.93 (95% CI 0.81-1.00) at week 2 of standard TB treatment ( Figure 5E and Table S3B ). Overall, the R 9 -score is seen to have a high potential to be used in the clinic. R 9 -score, TB-score, and RESPONSE5 score comparison in validation datasets and BLTB cohort Recently, two other gene signatures have been reported to monitor TB treatment response. The first is a 5-gene signature (RESPONSE5) that contains one non-coding RNA (RP11-295G20.2) and four coding genes (SMARCD3, UCP2, MAP7D3, and STT3A) (Thompson et al., 2017) . Of these, SMARCD3 is also a part of our signature. RESPONSE5 consists of a score based on six pairs of the five genes. It was reported that the RESPONSE5 signature predicted week-24 PET-CT status at baseline, week 1, and week 4 (AUC = 0.72/0.74). The second reported biomarker is a 3-gene signature (GBP5, DUSP3, and KLF2) (Sweeney et al., 2016) , which was used to compute a TB-score that was reported to be significantly associated with the 6-month radiological outcome and was reported to identify treatment failure cases at the end of treatment with AUC 0.93 (Warsinske et al., 2018) . We first checked the expression values of individual genes of the TB-score and RESPONSE5 score in a few samples from good (N = 11) and poor responders (N = 4) from our BLTB cohort ( Figure S7 ). We found that DUSP3 and GBP5 showed upregulation in week 0 (vs. HC) samples, consistent with that seen in other cohorts. KLF2, which was reported to be downregulated in TB-score signature, is seen to be upregulated in our BLTB cohort in good responders (week 0 vs. HC), showing an opposite trend. It is, however, downregulated marginally (0.62-fold) in poor responders. SMARCD3 and RP11-295G20.2 showed trends consistent with the previous cohorts (upregulation at week 0 compared to HC). MAP7D3, STT3A, and UCP2 show an opposite trend (upregulation compared to HC) in gene expression in the BLTB cohort's good responders (as compared to downregulation that was reported by Thompson et al. in the RESPONSE5 study) (Thompson et al., 2017) . STT3A and UCP2 are downregulated marginally (0.35 and 0.33-fold, respectively) in poor responders. In all, significant differences are seen in the gene expression variation patterns in different cohorts ( Figure S7 ). Next, we computed the TB-score and RESPONSE5 score and compared them with the R 9 -score for different datasets, including our BLTB cohort. The TB-score was computed for all three GSE IDs as described by Sweeney et al. (2016) . We used the relative copy number (RCN) values instead of the delta cycle threshold (dCt) values for the BLTB cohort, as the Ct values are inversely related to gene expression. The computed TB-score (using dCt values) will have an opposite trend as compared to microarray or RNA-Seq data, where the gene expression is directly related to signal intensity and gene count ( Figure S8 ). RESPONSE5 score was computed as described by Thompson et al. (2017) , for the BLTB cohort. We could not compute the RESPONSE5 score for three public datasets as the score formulation is based upon the qRT-PCR-based Ct values, and its applicability to microarray or RNA-Seq-based transcriptome data has not been demonstrated. R 9 -score for three public datasets (GSE31348-Cliff, GSE89403-Thompson, and GSE40553-Bloom) were computed as mentioned in the STAR Methods section (Equation 6). We note that the ''not cured'' pool of samples, in general, are not available in most datasets and were present only in the GSE89403-Thompson dataset. This dataset (also called the CTRC cohort) is also the largest among all available datasets and contains data at multiple time points over the course of treatment. For biomarker discovery, we included this dataset but considered only week 0 and month 6 samples of ''definite cured'' individuals. For the present analysis, all other time points of this dataset and the pool of ''not cured'' samples available were used for validation purposes. Comparison between the three scores ( Figures 6 and S9 and Table S6 ) indicates the following: (i) In good responders of the BLTB cohort and all three public datasets, the R 9 -score was observed to increase as the treatment progress, showing successful response to TB treatment therapy ( Figures 6A-6C and 6J) ; (ii) In poor responders of the BLTB cohort and the ''not cured'' patients of GSE89403-Thompson, no significant difference in the R 9 -score was observed upon treatment, thus capturing unsuccessful treatment in those individuals ( Figures 6D and 6M) ; (iii) In good responders of the BLTB cohort and all three public datasets, both TB-score and RESPONSE5 score decreased significantly as treatment progressed, thus capturing cases of successful treatment response ( Figures 6E-6G, 6I and 6K) and in ''not cured'' cases for GSE89403 dataset, no significant difference in TB-score was observed after treatment ( Figure 6H ); (iv) In box limits indicate the 25th and 75th percentiles; whiskers extend 1.5 times the IQR from the 25th and 75th percentiles, and dots represent outliers. The y axis represents the score, and the x axis represents the treatment time. For the BLTB dataset, the individual patient score is represented in a black dot. The median (red dot) at different time points are connected through a red trace. The R 9 -score was observed to increase upon TB treatment indicative of response to therapy (A-C and J). After month 6 of TB treatment, the R 9 -score showed no change/decrease in poor responders of BLTB (M) and ''not cured'' or failure cases of the GSE89403-Thompson (D) dataset, depicting poor or no response to TB treatment therapy. TB-score (I and L) and RESPONSE5 score (K and N) were observed to decrease upon TB treatment in patients in BLTB cohort (both good and poor responders), thus failing to capture poor responders. RESPONSE5 score could not be computed for the three public cohorts as the score formulation is based upon the qRT-PCR based Ct values of the genes, which were not available. Multigroup comparison was performed with ANOVA for each cohort, and two-group comparisons (for selected time points) were performed using Wilcoxon-test, and the p value is represented. *p< 0.05, **p< 0.01, ***p<0.001. *Individual patient scores at different treatment time points are provided in Table S6 . iScience Article the BLTB cohort, TB-score and RESPONSE5 score failed to capture the poor responders ( Figures 6L and 6N ) whereas R 9 -score successfully captured the same ( Figure 6M ). From the above, we conclude that R 9score shows good performance in all the tested cohorts, whereas TB-score and RESPONSE5 score were not able to capture the poor responders through the course of TB treatment in the BLTB cohort. At the individual patient level, also a rise in the R 9 -score for patients who responded to treatment ( Figures S9A-S9C and S9J) and a drop or no change in the score for the poor responders ( Figures S9D and S9M ) was observed. The week 2 R 9 -score was higher for good responders as compared to poor responders in the BLTB cohort ( Figures S9J and S9M ). In addition, for the BLTB cohort, higher score values were observed in the good responders as compared to poor responders ( Figure 5B ) across different treatment time points. However, the extent of rise or fall in the R 9 -score was heterogeneous ( Figures 5B and S9 and Table S6 ). Further, Warsinske et al. and Thompson et al. report that both TB-score and RESPONSE5 score can stratify TB subjects at week 0 into good and poor responders (or cured and failure categories). However, for the samples tested in the BLTB cohort, we did not observe any significant difference in TB-score and RESPONSE5 score at week 0 between the good and poor responders ( Figure S10 ). We explored if the 9-gene biomarker panel can predict response and hence serves as a prognostic marker at week 0 itself. For this, we considered the two datasets (GSE89403-Thompson and BLTB cohort) that have samples from ''cured/good responder'' and ''not-cured/poor responder'' categories. We computed a modified score (Week-0 score) using the same panel (see STAR Methods) and observed significant difference (p value < 0.05) in the score between the two categories ( Figure S11 ). Overall, the score is significantly lower for ''not-cured/poor responder'' compared to the ''cured/good responders'' and thus able to segregate the two categories at week 0 itself with an AUC of 0.72 (95% CI 0.49-0.94) for GSE89403-Thompson and 0.73 (95%CI 0.52-0.94) for BLTB cohort. We note that a further trial with larger sample size is necessary to assess the clinical utility of the Week-0 score. Overall, our signature performs well in the three datasets pertaining to cohorts from different geographical locations. Despite decades of research and interventions, tuberculosis remains a critical public health problem, especially in the developing world. New diagnostic strategies are needed, including point-of-care solutions to cure patients with TB and minimize disease transmission risk to other individuals (Pai and Schito, 2015; Pai and Furin, 2017; Walzl et al., 2011 Walzl et al., , 2018 . There is an acute need for developing robust biomarkers to detect TB and the assessment of therapy failure. In this study, we present a much-needed, non-sputum-based biomarker for detection of response to treatment. Using both computational and experimental approaches, we have identified a host-based blood transcriptomic signature (R 9 ) that efficiently identifies response to therapy and detects poor responders within a week of TB treatment, which is months faster than any conventional test. Our results suggest that the R 9 -score is likely to be associated with the disease's therapy response and clinical progression. The scores were computed blindly without using any knowledge of patient case histories. It is encouraging that all good responders were correctly identified. The response was seen in some subjects at early time points itself, suggesting that it may be useful to carry out a systematic study to test the ability of R 9 -score to predict treatment prognosis at an early time point. Given the formulation, higher the R 9 -score, higher is the response to treatment. Theoretically, we reasoned that an R 9 -score of $1 implies no change in the gene expression between TB 0 and TR; hence there is no response to treatment. A score of <1 implies an increase in severity of the disease and poor response to treatment, and an R 9 -score > 1.5 indicates a decrease in the disease severity and hence a successful response to treatment. A larger sample size will be necessary to clearly establish these thresholds. Given the complexity in the host response to TB, a single gene marker is insufficient to cater to a large population section, especially those of multiple genetic backgrounds. The use of a 9-gene panel overcomes this limitation since at least a few of the nine genes can be expected to show the required trend in a given patient. Our panel is capable of performing well in large datasets from different geographic locations depicting its robustness against diverse populations, possibly also different M. tuberculosis strains. The signature genes have appreciable roles in TB pathogenesis, making the panel biologically relevant, in contrast to panels that mechanism-blind data-driven approaches can provide. During the course of our work, two other signatures, TB-score and RESPONSE5 score, were published (Sweeney et al., 2016; Warsinske et al., 2018; Thompson et al., 2017) . We tested these in our BLTB cohort and found that they could not discriminate between good responders and poor responders at week 0 and hence cannot be readily used for triaging in the BLTB cohort. Our results clearly indicate that there are significant differences in the pattern of gene expression variation in some genes in different cohorts, perhaps due to differences in ll OPEN ACCESS iScience 25, 103745, February 18, 2022 13 iScience Article genetic backgrounds, mycobacterial strain differences, or variations in the environment. This emphasizes the strong need to discover more robust and more encompassing biomarker candidates and test them on multiple ethnicities, especially in regions with high disease burden. The point of a biomarker signature and a score is to generate a clinical decision support tool. The basic questions a clinician is faced with while treating a patient with TB are: (a) Is TB 0 s diagnosis confirmed? (b) what is the disease severity? Is the individual in the highrisk category (severity, risk of spreading, complications, MDR, or XDR TB) and needs to be monitored more closely? (c) is the individual likely to respond to first-line therapy? or (d) should a second-line treatment be given? (e) is the treatment progressing in the right direction? and (f) when to stop treatment? Although an ideal biomarker may provide answers to all these questions, such a biomarker may not even exist, especially one that applies to different populations. Realistically, a given biomarker may have a high capacity to answer one question but not the others, indicating why it is important to explore different biomarkers. Both TB-score and RESPONSE5 score answer questions (a) and (c) and to an extent also (b) whereas our signature answers questions (a) and (e) and even the questions (b) and (d) in about a week after treatment and also has the potential of being tested for (f) through a separate trial. Moreover, our signature monitors individual subjects through the course of treatment and determines if the individual is responding to the given treatment at any given time point. This has a considerable advantage in terms of having the most appropriate reference, which is a subject's own week 0 sample, which removes any inherent bias. A drawback is that a sample has to be taken from the same patient at multiple time points. However, this does not pose a major problem because patients are regularly monitored under the DOTS scheme and return to the clinicians periodically for assessment and renewals of their prescriptions. The sample required is a few milliliters of blood, which is easily accessed from the patient. It is feasible to collect blood samples of TB subjects even in remote places and send the samples for testing gene expression values of the nine genes using qRT-PCR where such facilities are available. The R 9 -score can also help evaluate the success of new drugs in clinical trials. Monitoring these biomarkers 0 progresses in individual subjects will shed light on the individual's response to therapy and help delineate early and late responders, paving the way toward precision medicine or personalized treatment regimens. A marker that can determine if a patient is responding to therapy accurately and early is expected to facilitate quicker access to the effective treatment and hence minimize the risk of lung damage and spread to other organs. Our signature shows a lot of promise in this direction and serves as an excellent candidate for the next phase of testing involving multiple centers and larger cohorts. A limitation of the study is the small sample size for poor responders, which is the case with the previous two studies reported in the literature as well. For multiple reasons, obtaining samples especially over a follow-up of poor responders have been a major challenge. Difficulty in identifying and correctly labeling a poor responder further emphasizes an urgent need for a sensitive molecular test such as R 9 , described in this study. A second limitation is that the threshold for the score is platform-dependent. It works for the qRT-PCR-based data and needs further assessment for microarray and RNA-Seq-based gene expression quantification. A further limitation is that, for the BLTB cohort, while sputum smears and chest X-rays were available, sputum cultures were not, as the sputum samples are not routinely cultured in local clinical practice. However, we monitored the patients at multiple time points with different clinical scores and were able to establish the response to treatment. Some challenges in developing the signature further into a blood test that can be used routinely are the cost and infrastructure to run qRT-PCR assays for the signature of this size. The infrastructure for such testing has become more accessible during the current pandemic in resource-limited settings. We foresee the possibility of an R 9 test being carried out on blood samples of patient with TB, similar to the nasal and throat swab tests that are currently carried out for COVID-19 diagnosis. However, further work is required for this to develop into a cost-effective clinical test. Lastly, this is an exploratory study to establish proof of concept. This study included adult cases with the first episode of TB who underwent first-line treatment therapy and excluded children. The next stage would be to also include TB patients with HIV, diabetes, alcohol misuse, or with other co-morbidities, those on second-line treatment, and those with a previous history of TB, and extrapulmonary TB as they are at a higher risk of treatment failure. Detailed methods are provided in the online version of this paper and include the following: Detectable changes in the blood transcriptome are present after two weeks of antituberculosis therapy. PLoS ONE 7, e46191. Bradford, W.Z., Martin, J.N., Reingold, A.L., Schecter, G.F., Hopewell, P.C., and Small, P.M. (1996) This study did not generate new unique reagents. Data and code availability d This paper does not report original code. d Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request. Table S1C This paper Table S1C Demographic details of the individuals included as Healthy controls provided as Table S1D This paper Table S1D Oligonucleotides qRT-PCR primers details provided as Table S5 This paper Table S5 Software and algorithms iScience Article 15032017). The detailed information related to the subjects included in the study are described in this section, and in tables (Tables 1 and S1 ). Building a TB longitudinal cohort (BLTB: Bangalore longitudinal TB cohort) Ethical approval to collect whole-blood samples from pulmonary-TB subjects was obtained from the Institutional Ethics Committee at Rajiv Gandhi University of Health Sciences, Bangalore (SDS/IEC/01/2016-17), and the Institutional Human Ethics Committee (IHEC) at the Indian Institute of Science (11-15032017), Bangalore, India. In total 44 subjects, with newly diagnosed pulmonary TB, as confirmed by sputum AFB smear test (Ziehl-Neelsen staining), and chest X-ray were recruited after informed consent between January 2017 and October 2018 at SDS Tuberculosis Research Center and Rajiv Gandhi Institute of Chest Diseases, Bangalore. All subjects were HIV negative with no previous TB disease or any other co-morbidity (Inclusion exclusion criterias are described in Table S1B ). Sputum smear grades were used to determine bacterial burdens and classified as 1+, 2+, and 3+. All subjects were treated with standard DOTS treatment therapy (2HRZE/4HR). All except four subjects were declared as sputum smear-negative post-treatment. Treatment was continued for six months in clear cases and longer in cases with complications. One patient was diagnosed with lung cancer in addition to TB after 4 months and died during the study. 12 subjects were lost to follow up after 6 months. The remaining subjects were followed-up for 1 year from the start of treatment. Whole blood samples were collected at the time of enrollment (week-0), week-2, week-3, month-1, month-2, month-3, month-4, month-5, month-6, month-8, month-10 and month-12 (Number of samples collected at each time point are listed in Table S1A ). Attempts were made to obtain AFB sputum smear test, Chest Xray score (Timika score) (Chakraborthy et al., 2018) , TB Score I and II (Rudolf, 2014; Rudolf et al., 2013) , Karnofsky performance score (Schag et al., 1984; Pé us et al., 2013) , and ESR (Erythrocyte Sedimentation rate) at each time point of blood collection (details in Table S4 ). Only 32 of the recruited active TB subjects followed up the whole 6-month DOTS treatment therapy, which we refer to to as the Bangalore Longitudinal TB Cohort (BLTB), of which we had samples for most time points only for 25 subjects ( Figure 1A , demographic details of the BLTB individuals are listed in Table S1C ). Patients with early sputum conversion and maintained negative sputum throughout the course of treatment and those which showed decrease in Timika score (Chest-Xray score) were considered as Good responders (N = 22). There was a smooth progression from TB to treated state upon 6-month TB treatment therapy in Good responders. Patients (a) who had late sputum conversion, (b) no decrease in Timika score, and (c) those who developed TB related complications during the course of treatment were considered as poor responders (N = 10, Table S1C ). Ethical clearance was obtained from the Indian Institute of Science Human Ethics committee (11-15032017), to take blood samples from healthy volunteers for the purpose of establishing reference ranges of gene expression values. Informed consent was obtained from each volunteer. The inclusion criteria were: Age: 18-65, healthy people, both men and women. The exclusion criteria were: Diabetes, hypertension, HIV, consumption of alcohol 24 h prior to sample collection, pregnancy, lactation, any medication, chronic liver or kidney diseases (Table S1B ). Samples were collected at the Health Center, Indian Institute of Science, Bangalore, India. Volunteers were subjected to a routine medical check-up, hemogram, IGRA, HIV and a chest X-ray ( Figure 1B and Table S1D ). Three individuals were found to be IGRA positive with the Quanti-FERON-TB Gold assay, reflective of possible latent tuberculosis and were hence excluded from the cohort. The remaining 22 individuals were IGRA negative, HIV negative, and had a normal chest X-rays and hemogram profiles and were considered as healthy controls (HC). Tables S1C and S1D describes the details of the individual classes of subjects and healthy controls that were recruited in the study. We searched public gene expression repositories (NCBI-GEO and ArrayExpress) and retained datasets that had whole blood gene expression data for active pulmonary tuberculosis patients before and after TB treatment. The following whole blood transcriptomes from active TB subjects at different TB treatment time points that were available publicly are used in this study: GSE89403 (99 TB patients followed through week1, week4 and week24 of TB treatment), GSE31348 (27 TB patient followed through week1, week2, week4 and week26 of TB treatment), GSE40553 (29 TB patient followed through week2, month2, month6 and month12 of TB treatment) and GSE122485 (4 TB patient followed at month6 and month12 of TB treatment) ( iScience Article other cohorts that were used in this study are: a South Indian TB 0 and TR cohort (BLTB with 32 patients followed up for a year with 204 samples) and a healthy cohort (22 samples) which are described earlier (Tables 1 and S1A ). Whole blood was collected from subjects in potassium EDTA tubes and transferred in RNAlater tubes after mixing, which were stored at À20 + C. RNA was extracted from blood using RiboPure-Blood kit (Thermo-Fisher scientific) following the manufacturer 0 s protocol. Briefly, 1.8 mL of each RNAlater mixed blood sample (0.5 mL blood +1.3 mL of RNAlater) was transferred to a 2 mL microcentrifuge tube and centrifuged at 16,000 3 g for 1 min. The supernatant was discarded and the pellet was lysed by adding 800 mL of lysis solution and 50 mL of sodium acetate solution and vortexed vigorously. 500 mL of acid-phenol chloroform was added and vortexed for 30 s and stored at room temperature for 5 min. The sample was then centrifuged at 16,000 3 g for 1 min to separate aqueous and organic phases. The aqueous phase was transferred to a fresh tube and 600 mL of ethanol was added and invert-mixed. The ethanol-added aqueous phase was loaded on a filter cartridge assembly and centrifuged for 30 s and further washed, first with Wash-solution one and subsequently twice with Wash solution 2/3. Filters were allowed to dry and the RNA was eluted in a preheated 50 mL of elution buffer. RNA was DNase treated and quantified on NanoDrop Light UV-Vis Spectrophotometer (Thermo Fisher Scientific). First-strand cDNA synthesis was performed using 600 ng of total RNA with iScript Select cDNA synthesis kit (Bio-Rad) using random hexamer oligonucleotide as well as oligo dT primers (Table S5 ). Gene expression was analyzed with real-time PCR using iQTM SYBER Green Supermix (Bio-Rad) on StepOnePlus PCR system (Applied Biosystem). 18S rRNA was used as the internal housekeeping control gene because its level of expression was relatively high and consistent in all the treated blood samples. The delta cycle threshold of the gene (DCt g1 ) is calculated by substracting the internal housekeeping control gene Ct value (Ct 18S ) from the Ct value of the gene (Ct g1 ), as described in Equation 1: DCt ðg 1 Þ = Ct g 1 À Ct 18S (Equation 1) Median DCt g1 values were then used to calculate relative copy number for each gene (RCN (g1) ), as described in Equation 2: RCN ðg 1 Þ = 2 ÀDCt ðg 1 Þ Integrating a knowledge-based human interactome with transcriptomes in TB 0 and TR conditions A human knowledge-based comprehensive protein-protein interaction network (hPPiN , Table S2A ), available in the laboratory, was used for the construction of condition-specific networks (Sambarey et al., 2017a) . This hPPiN is manually curated and includes high confidence, protein-protein interactions reported in the primary literature and five large databases, STRING v10, SignaLink v2.0, Cancer Cell Map, BioGRID and Multinet. hPPiN thus constructed, contained both physical complexes of the proteins as well as functional or genetic interactions. A majority of edges had directional information. The rest were considered bidirectional. In the network, proteins are represented as nodes and the interactions among them are represented as edges. The network consisted of 17,062 nodes and 208,760 edges. Normalised gene expression values from the RNA-Seq data of the discovery datasets (GSE122485-Sambarey and GSE89403-Thompson) (Sambarey et al., 2017b; Thompson et al., 2017) were integrated onto the hPPiN in the form of node weights to generate condition-specific networks for TB 0 and TR conditions. Node weight (NW i ) for gene i is calculated in Equation 3: Here, Exp a/b is the fold change in gene expression value of gene i in condition a (TB 0 ) as compared to condition b (TR). Edge weights (EW ij ) between interacting genes i and j is calculated in Equation 4: An interferon-inducible neutrophil-driven blood transcriptional signature in human tuberculosis A controlled trial of six months chemotherapy in pulmonary tuberculosis-first report; results during chemotherapy Extensive drug resistance acquired during treatment of multidrug-resistant tuberculosis RNTCP National Strategic Plan for TB Elimination in India RNTCP National Strategic Plan for TB Elimination in India TOG-Chapter 4-Treatment of TB (Government of India) Training Modules (1-4) for Programme Managers and Medical Officers (Government of India) Chest X ray score (Timika score): an useful adjunct to predict treatment outcome in tuberculosis The human immune response to tuberculosis and its treatment: a view from the blood Emergence of extensive drug resistance during treatment for multidrug-resistant tuberculosis The diagnosis and misdiagnosis of tuberculosis [State of the art series The epidemiology, pathogenesis, transmission, diagnosis, and management of multidrugresistant, extensively drug-resistant, and incurable tuberculosis Effect of Xpert MTB/RIF on clinical outcomes in routine care settings: individual patient data metaanalysis Xpert MTB/RIF ultra for detection of Mycobacterium tuberculosis and rifampicin resistance: a prospective multicentre diagnostic accuracy study The reactome pathway knowledgebase Four-month moxifloxacin-based regimens for drug-sensitive tuberculosis Predictors for pulmonary tuberculosis treatment outcome in Denmark Applied Logistic Regression Shortening treatment in adults with noncavitary tuberculosis and 2-month culture conversion KEGG: integrating viruses and cellular organisms Directly observed therapy for treating tuberculosis A family of IFNg-inducible 65-kD GTPases protects against bacterial infection Drugresistant tuberculosis: challenges and progress Management of drug-resistant tuberculosis Emergence of drug resistance in patients with tuberculosis cared for by the Indian health-care system: a dynamic modelling study A four-month gatifloxacin-containing regimen for treating tuberculosis Identification of a gene signature for discriminating metastatic from primary melanoma using a molecular interaction network approach Combined use of delamanid and bedaquiline to treat multidrug-resistant and extensively drug-resistant tuberculosis: a systematic review Gene-expression patterns in whole blood identify subjects at risk for recurrent tuberculosis Treatment of drug-resistant tuberculosis. An official ATS/CDC/ERS/IDSA clinical practice guideline Genome-wide expression profiling identifies type 1 interferon response pathways in active tuberculosis Point of view: tuberculosis innovations mean little if they cannot save lives Tuberculosis diagnostics in 2015: landscape, priorities, needs, and prospects Annexin A3 is a potential angiogenic mediator Scikit-learn: machine learning in Python Appraisal of the Karnofsky performance status and proposal of a simple algorithmic system for its evaluation TBscore II: refining and validating a simple clinical score for treatment monitoring of patients with pulmonary tuberculosis. Scand Meta-analysis of host response networks identifies a common core in tuberculosis Unbiased identification of blood-based biomarkers for pulmonary tuberculosis by modeling and mining molecular interaction networks Mining large-scale response networks reveals 'topmost activities' in Mycobacterium tuberculosis infection EpiTracer-an algorithm for identifying epicenters in condition-specific biological networks PathExt: a general framework for path-based mining of omics-integrated biological networks Karnofsky performance status revisited: reliability, validity, and guidelines Cytoscape: a software environment for integrated models of biomolecular interaction networks Bacterial loads measured by the Xpert MTB/RIF assay as markers of culture conversion and bacteriological cure in pulmonary TB Sputum processing methods to improve the sensitivity of smear microscopy for tuberculosis: a systematic review Differential gene expression of activating Fcg receptor classifies active tuberculosis regardless of human immunodeficiency virus status or ethnicity Genome-wide expression for diagnosis of pulmonary tuberculosis: a multicohort analysis R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing Feasibility, accuracy, and clinical effect of point-of-care Xpert MTB/RIF testing for tuberculosis in primary-care settings in Africa: a multicentre, randomised, controlled trial Host blood RNA signatures predict the outcome of tuberculosis treatment Differential transcriptomic and metabolic profiles of M. africanum-and M. tuberculosis-infected patients after Mycobacterium tuberculosis promotes regulatory T-cell expansion via induction of programmed death-1 ligand 1 (PD-L1, CD274) on dendritic cells Treatment delay affects clinical severity of tuberculosis: a longitudinal cohort study Tuberculosis: advances and challenges in development of new diagnostics and biomarkers Immunological biomarkers of tuberculosis Assessment of validity of a blood-based 3-gene signature score for progression and diagnosis of tuberculosis, disease severity, and treatment response WHO Consolidated Guidelines on Drug-Resistant Tuberculosis Treatment (World Health Organization) Functional interaction network construction and analysis for disease discovery Gene set knowledge discovery with Enrichr Transmission of multidrug-resistant Mycobacterium tuberculosis in Shanghai, China: a retrospective observational study using whole-genome sequencing and epidemiological investigation Heat shock protein 60 activates cytokine-associated negative regulator suppressor of cytokine signaling 3 in T cells: effects on signaling, chemotaxis, and inflammation The authors thank Rajiv Gandhi University of Health Sciences, Karnataka; Department of Biotechnology (DBT), Govt. of India; and Wellcome Trust/DBT India Alliance for financial support. CT is a DST-INSPIRE fellow. We also thank Narmada Sambaturu from NC lab for critically reading the manuscript. We thank Dr. Kushi Anand (AS lab) and Mr. Rajesh (Health Center, IISc, Bangalore) for their help with blood sample collection. The authors declare that they have no conflict of interest. We worked to ensure that the study questionnaires were prepared in an inclusive way.Received: May 17, 2020 Revised: March 31, 2021 Accepted: January 6, 2022 Published: February 18, 2022 Here, NW i and NW j are the node weights of the edge forming interacting nodes. Edge weight represents the strength of interacting nodes. Lower the edge weight stronger is the interaction between the nodes and higher the expression of the participating nodes. A sensitive network mining algorithm previously developed in the laboratory was used for constructing a response network, which in essence captures the response of the system (TB 0 vs. TR) in the form of topranked perturbed paths (Sambarey et al., 2013 (Sambarey et al., , 2017b . To identify these, all-source to all-destination shortest paths in each condition-specific network were computed using Dijkstra 0 s algorithm implemented in the Zen library in python2.7 http://www.networkdynamics.org/static/zen/html/api/algorithms/shortest_path. html. A path refers to a set of serially connected edges in the network that are traversed to reach a sink from a source node and a path score is taken as the summation of the edge weights present in the given path, which is normalized with the path length to get a normalized path score. Paths were computed for all pairs of nodes. Sub-paths are not explicitly eliminated for two reasons: (a) the paths are scored and the topranked paths were taken for final analysis, which largely eliminates the need for explicitly eliminating subpaths, (b) unique edges from the top-ranked paths are taken as an input into response network construction, which in any case removes redundancy in nodes and edges. Given the weighting scheme and the formulation of the shortest path algorithm, paths with the least score are ranked the highest and represent the highest perturbations. Approximately 1.5 billion paths were generated by running Dijkstra 0 s algorithm, from which the top 0.005 percentile paths were taken as the top-ranked perturbed paths and were pooled into a response network. All networks were visualised and analyzed in Cytoscape v3 (Shannon et al., 2003) . Module identification and Reactome pathway enrichment for each module was performed using ReactomeFIViz, Cytoscape plugin (Wu and Haw, 2017; Fabregat et al., 2017) . Reactome pathways with q-value < 0.05 were considered significantly enriched. Gene set enrichment analysis for genes present in response networks was carried out using EnrichR and KEGG pathways with adjusted pvalue < 0.05 were considered significantly enriched (Xie et al., 2021; Kanehisa et al., 2021) . The gene panel identified in the study was evaluated in three independent cohorts, using the Logistic Regression (LR) classifier from scikit learn package v0.20.3 in python2.7 (Hosmer et al., 2013; Pedregosa et al., 2011) . Model evaluation was performed with 5-fold cross-validation to check how well it performs in predicting the target variable on different subsets of the data and to minimise any bias and variance. Sensitivity, specificity and prediction accuracies were computed based on the generated confusion matrices. A Receiver Operating Characteristic (ROC) curve was generated which summarizes the model 0 s performance by evaluating the trade-offs between the true positive rate (TPR/sensitivity) and the false positive rate (FPR/1-specificity).R 9 -score formulation A response score was formulated to determine response to TB treatment as follows: The antilog values of the fold change in gene expression of each gene in the panel is calculated, as described in Equation 5:Here, FC (g1) is the fold change in expression of gene g 1 , I t is the gene expression value of gene g 1 in TR at a given time point t of treatment, and I 0 is the gene expression value for the same gene g 1 in TB 0 (Week 0). The R 9 -score at time t (Equation 6), is the inverse of the geometric mean of the fold change values of all the nine genes in the panel, explained in Equation 5: Here, DCt t is the normalized C t value of the gene obtained after subtracting the C t value of internal reference gene (18s rRNA) in TR. DCt 0 is the normalized C t value of the gene in active TB condition (Week-0 TB or TB 0 ). DDCt (g1) is the relative change in expression of gene g 1 in TR t vs. TB 0 . RQ (g1) is the fold change value of gene g 1 . The R 9 -score is further calculated by taking the inverse of the geometric mean of all nine gene RQ values, as described in Equation 9:For Week-0 score calculation 18SrRNA (qRT-PCR data) and SDHA (GSE89403 data) were used as internal reference genes as these genes showed least change in expression across patients. For GSE89403 dataset Week-0 score (W 0 -score) was calculted by taking an inverse of geometric mean of the nine genes normalized antilog values obtained after substracting the log value of the gene (E (g1) ) with the same patient reference gene expression (E (SDHA) ), as described in Equations 10 and 11:)For qRT-PCR data Week-0 score (W 0 -score) was calculated from the Week-0 DCt 0 values of the 9-genes, as described in Equation 12:) All statistical analyses were performed using R version 3.6.3 (R Core Team, 2013). ANOVA and Kruskal-Wallis were used for multi-group comparisons and Student 0 s t-test and Wilcoxon-Mann-Whitney test were used for two group comparisons for parametric and non-parametric data respectively. Differences with a pvalue of < 0.05 were considered as significant. Median with IQR are shown when individuals points are not plotted in the box and whisker plots and p-values are represented as *p< 0.05, **p< 0.01, ***p< 0.001. Ethical approval for this study was obtained from the Institutional Ethics Committee at Rajiv Gandhi University of Health Sciences, Bangalore (SDS/IEC/01/2016-17), and the Institutional Human Ethics Committee (IHEC) at the Indian Institute of Science (11-15032017), Bangalore, India.