key: cord-0256664-jngs39ey
authors: Năstase, Ana-Maria; Barrett, Michael P.; Cárdenas, Washington B.; Cordeiro, Fernanda Bertuccez; Zambrano, Mildred; Andrade, Joyce; Chang, Juan; Regato, Mary; Carrillo, Eugenia; Botana, Laura; Moreno, Javier; Milne, Kathryn; Spence, Philip J.; Rowe, J. Alexandra; Rogers, Simon
title: Alignment of multiple metabolomics LC-MS datasets from disparate diseases to reveal fever-associated metabolites
date: 2021-09-03
journal: bioRxiv
DOI: 10.1101/2021.09.02.458540
sha: 35de4f496bf3aaf576f8352cecbb52ca49bde2d6
doc_id: 256664
cord_uid: jngs39ey

Acute febrile illnesses are still a major cause of mortality and morbidity globally, particularly in low to middle income countries. The aim of this study was to determine any possible metabolic commonalities of patients infected with disparate pathogens that cause fever. Three liquid chromatography-mass spectrometry (LC-MS) datasets investigating the metabolic effects of malaria, leishmaniasis and Zika virus infection were used. The retention time (RT) drift between the datasets was determined using landmarks obtained from the standard reference mixtures generally used in the quality control of the LC-MS experiments. We used fitted Gaussian Process models (GPs) to perform a high level correction of the RT drift between the experiments, followed by standard peakset alignment between the samples with corrected RTs of the three LC-MS datasets. Statistical analysis, annotation and pathway analysis of the integrated peaksets were subsequently performed. Metabolic dysregulation patterns common across the datasets were identified, with kynurenine pathway being the most affected pathway between all three fever-associated datasets.

regressions are a non-parametric approach to modelling data and they differ from 49 standard regression models in that they do not require any assumptions about a 50 particular parametric form for the function being modelled. We used fitted GPs to 51 perform high level correction of retention times between experiments, after which 52 standard alignment was performed. 53 The algorithm was developed here specifically to determine whether any metabolites 54 could be identified that changed in abundance in similar ways across a series of distinct 55 fever-associated diseases. These include Zika virus infection in patients from metabolites whose variation in abundance followed common trends in different datasets, 60 we aimed to determine disease-generic metabolites that could assist in both 61 understanding the pathophysiology of infectious disease, and also identifying 62 metabolites that change in abundance in individual studies that may be fever rather 63 than specific disease related. Yes Yes Yes Information about each LC-MS experiment including the disease studied, type of study, number of controls and patients, LC-MS platform used, date when the samples from the LC-MS experiment were run and availability of fragmentation (MS2) data are presented.

LC-MS platform The experiments were performed at different time points (Table 1 ) 75 using the same LC-MS platform: Thermo Orbitrap QExactive (Thermo Fisher 76 Scientific) mass spectrometer coupled with a Dionex UltiMate 3000 RSLC system 77 (Thermo Fisher Scientific, Hemel Hempstead, UK) using a ZIC-pHILIC column. While 78 the same flow rate was used for all three datasets, the length of the run differed for 79 DVL, which lasted longer than the other two datasets. Detailed information about the SRM compounds from both the +ve and -ve 90 electrospray ionisation (ESI) mode is included in the Supporting Information (S1 Table) . 91

The data analysis process is outlined in Fig 1. 93 Fig 1. Diagram representing the study workflow. Peaks are detected from input LC-MS data including the SRMs and samples using MZmine2 and peakset lists containing ion information (m/z, RT, intensity) are obtained. A. SRM analysis. A reference dataset is selected and the RT drift in the other datasets is calculated and modelled using GP regression. B. Samples analysis. Based on the GPR models obtained for each dataset the RT is corrected in each peakset list and alignment is done using MZmine2. Afterwards, statistical analysis focused on the intensity differences between the control and infected samples is performed using limma R package. This is followed by annotation and pathway analysis using mummichog.

Peak detection The processing of the spectral data begins with the detection of the 94 chromatographic peaks from the input SRM and samples LC-MS data. Peak detection 95 was performed using wavelet transform from MZmine2 v2.40.1 [16] in batch mode.

SRMs analysis First, the reference dataset, D r ef = D Z , was randomly selected out 97 of the two datasets with the shorter run length. Next, a profile was created for each 98 dataset characterised by (m/z, RT) of the extracted SRM compounds from the peak 99 lists obtained after the SRM peak detection process. The profiles of the non-reference 100 datasets, D M and D V L , were then mapped to the reference dataset profile and the RT 101 drift between each of the non-reference datasets and the reference dataset was 102 determined and modelled using GP regression.

GPR modelling of the RT drift For each non-reference dataset, the RTs from the 104 (m/z, RT) profile created (input observations) were regressed against their respective 105 RT drift from the reference dataset values (observed values). In order to obtain a closer 106 fit to the data but still maintain the variability, SRM metabolite outliers were removed 107 based on their RT drift from the reference profile using a z-score cutoff value of 2. For 108 implementing the Gaussian Process models, the GPy python package, version 1.9.9 was 109 used [17] . Model hyperparameter optimisation was done using multiple restarts (n=10) 110 with the GPy optimiser to avoid local minima. To determine which covariance function 111 aids in fitting the data best in each case, cross-validation was performed, by stratifying 112 and splitting the SRM data in half for training and testing the model. aligned and the total number of SRM metabolites that align for each RTWindow value 123 across the datasets is calculated. In order to determine the optimal RTWindow for all 124 datasets, the value for which the alignment results in the lowest total number of peaks 125 aligned and highest number of SRM metabolites is chosen for further analysis.

Sample analysis The GPR models obtained in the previous stage were applied to 127 each sample peak list by correcting the RT values for each peak as detailed above. The 128 RT corrected peak lists were then aligned using the RTWindow value previously cut-off value of 50% was used. Data imputation using K-nearest neighbours method 132 (k=3) was performed solely for visualisation purposes [18] . adjusted p-value and logarithmic fold change (logFC) between the two conditions. The 140 formula used for the linear regression is given below.

Where: Y ijk = response variable (intensity level of metabolite j in condition i and 142 dataset k), α j = intercept for metabolite j, x i = first predictor variable: condition 143 (infected/ control), β j = estimated difference for metabolite j for each condition, z k = 144 second predictor variable: dataset, γ j = the dataset effect for metabolite j, ijk = error 145 stochastic component, within group variation.

Feature annotation and pathway analysis The MS2 spectra from each dataset 147 was aligned to the final filtered peaksets list. A profile characterized by (m/z, RT, 148 ms2spec) was created for the possible adducts/fragments for each peakset (S2 Table) 149 and several methods were used for feature annotation. First, annotation was performed 150 using the SRMs information by mapping the peakset profile against the SRMs (m/z, 151 RT) profiles. Next, they were mapped against metabolite information extracted from the Human Metabolome Database (HMDB). Where one or more spectra was aligned to 153 a peak, the attached spectrum/spectra was compared against experimental LC-MS 154 spectra from HMDB and the match with the highest cosine similarity score was used to 155 annotate the peak. For pathway and activity network analysis mummichog version 2.3.3. 156 was used [19] .

Code All of the analysis was performed in python programming language 158 (https://github.com/anamaria-uofg/mma). The information about any given peak 159 was stored in an object (peakinfo) with the following attributes: id, m/z, RT, p-val, 160 t-val, logFC, mummichog annotation, mummichog pathway, mummichog kegg id, std 161 annotation, std kegg id, spectra, adducts, best ms2 match adduct, ms2 annotation, 162 ms2 kegg id, intensities (from each sample).

Methodology evaluation 164 The peakset lists with no GPR correction were aligned and the same workflow was 165 applied to their analysis. The results obtained were compared with the results of the 166 GPR modified data. Also, the datasets were individually analysed and their filtered 167 peakset lists were then intersected to determine whether any commonality can be found 168 in this way. Additionally, we evaluated the alignment process using the available MS2 169 data. If two compounds with similar m/z and RT break down into the same fragments 170 (during LC-MS analysis), then it is highly likely they represent the same compound.

Therefore, if a peakset has multiple highly similar MS2 spectra from different datasets, 172 it is likely that the peaks were aligned correctly. In order to measure the similarity 173 between two MS2 spectra, cosine similarity score implemented in mass-spec-utils was 174 used [20] . In order to evaluate the alignment process using MS2 data the spectral 175 similarity between the MS2 profiles was computed when more than one spectrum was 176 aligned to one peakset. For each of the spectral similarity scores (good spectral 177 similarity score), the mean of the corresponding distribution of random spectral 178 similarity scores (bad spectral similarity score) was calculated, which were obtained 179 from spectra of peaksets with similar m/z, but different RT. Table 2 . Correlation between the RT drift and (m/z, RT) of each dataset profile.

Several kernels, among which RBF, neural network and cosine kernels, were tested to 192 determine which ones best fit the data (Fig 2) . Composite kernels were also tested on 193 the data. Composite kernels refer to multiple kernels combined either by addition or Spectral similarity scores of each peakset plotted against the difference between a random spectral similarity score and the actual spectral similarity score. The points below the red line represent the peaksets for which the actual spectral similarity score is higher than the score from randomly matched spectra with similar m/z.

Sample alignment 248 For the sample alignment, the -ve ESI mode data was processed in the same way as the 249 +ve mode data which was presented above. The JoinAligner module was run with 250 RTWindow = 0.5 min in order to align all 74 samples across the three datasets.

Following alignment there were 37220 peaksets in +ve mode and 24729 in -ve mode. Table) . Table) . (Table 4 ). Following the modular 281 analysis, the activity network for the + ESI mode is also centered around tryptophan 282 metabolism, specifically the kynurenine pathway, which is discussed in more detail in 283 the next section (Fig S3) . For the -ESI mode data the activity network also involves 284 tryptophan metabolites and, additionally, citric acid cycle metabolites. and tryptophan derivatives such as indoleacetic acid and methoxyindole acetate. Methyl 288 indole acetate and formyl-N-acetyl-5-methoxy kyunernamine were also significantly 289 decreased in the infected group with the exception of the Malaria dataset where the 290 logFC value was slightly higher (Fig 9) . In contrast, the kynurenine branch of 291 tryptophan metabolism suggests an increased activation as kynurenine and kynurenic 292 acid are present at higher levels in infected patients in all three datasets. 293 3-Hydroxyanthranilate was in general higher as well with the exception of the Malaria 294 dataset were logFC value was slightly lower. Anthranilate levels were also reduced in 295 infected patients across all datasets, although not reaching statistical significance.

Nicotinic acid (-ESI mode) was also found in significantly lower levels in infected 297 patients and nicotinamide was significantly lower, although the Zika dataset had 298 positive logFC (Fig 7) . It is important to note that unless otherwise stated, annotations 299 are putative and based on m/z alone. Tryptophan metabolism has previously been associated with various agents of 301 infection [21] , particularly its flow through the kynurenine pathway which produces 302 metabolites including kynurenate and nicotinamide adenine dinucleotide (NAD+). Of 303 particular interest is the inverse relation between kynurenine and tryptophan, as the 304 ratio between the two is used to measure the activity of the enzyme 305 indoleamine-2,3-dioxygenase 1 (IDO-1) [22] . IDO-1 is the rate limiting step of the 306 tryptophan pathway and it catalyses the breakdown of tryptophan to kynurenine.

IDO-1 activity is also tightly regulated by interferon gamma (IFN-γ) activity [23] .

Similar to this, COX-2, an enzyme central to the fever process, can also be induced by 309 IFN-γ [24] . The interplay between IDO-1 and COX-2 enzymes has been previously 310 studied where inhibition of COX-2 enzyme has led to a downregulation in IDO-1 and 311 decrease in kynurenine metabolites [25] . respectively, for the tryptophan fragment (with loss of ammonia) (Fig 10, Fig 11) . ). using SRM matching method, mummichog and HMDB matching method. The loss of ammonia from protonated tryptophan was observed as the primary fragmentation pathway in gas-phase reactions [26] . The spectrum belongs to D Z and it was matched against experimental LC-MS MS2 information from MassBank BML01191 compound with a cosine similarity score of 0.55 (fragment tolerance =0.2) (Metabolomics spectrum resolver plot).

Verifying the alignment in the case of niacinamide Both niacinamide and 326 niacin levels are significantly lower in infected patients. The peak annotated as 327 niacinamide in + ESI mode has MS2 spectra from two datasets and in this case the 328 similarity between the two spectra is illustrated in Fig 12. A cosine similarity score of 1 329 was obtained, demonstrating that in this case the alignment between the datasets was 330 optimal. ). using SRM matching method, mummichog and HMDB matching method. Top spectrum belongs to the MS2 information obtained from D M and the bottom spectrum belongs to the MS2 information obtained from D Z . The cosine similarity obtained using metabolomics spectrum resolver is 1, which signifies a perfect match and also that the alignment was accurate in this case (Metabolomics spectrum resolver plot).

Other affected metabolic pathways 332 Amino acid metabolism Other significantly affected pathways relate mostly to 333 other amino acids including alanine, aspartate and glutamate metabolism, methionine 334 and cysteine metabolism and glutathione metabolism (Table 4 ). Figure 6 indicates a 335 clear reduction in a number of amino acids and their metabolites particularly glutamine 336 and related metabolites. In immune cells, glutamine is converted through glutaminolysis 337 to glutamate, aspartate and alanine by undergoing partial oxidation [27] . This could 338 explain the decrease in glutamine and increase in aspartate in the infected group in all 339 three datasets. Similarly, another glutamine derived metabolite, asparagine, was found 340 to be increased in infected patients in all three datasets.

Glutamine also acts as a precursor for citrulline which plays an important role in 342 arginine biosynthesis in the urea cycle. Citrulline levels were significantly lower in the 343 infected group from the three datasets.

Altered glutathione metabolism accompanied by changes to metabolites associated 345 with sulfur containing amino acids and their metabolites including methionine and 346 cysteine metabolism could also be observed. The plasma levels of the amino acid 347 5-oxoproline (pyroglutamic acid) were also lower in the infected patients in comparison 348 to healthy controls. In addition to decreased methionine, methylcysteine was also 349 decreased in the infected patients. A precursor of cysteine, o-acetylserine was also 350 decreased in the infected group alongside threonine and homoserine. Taurine, another 351 metabolite derived from cysteine metabolism, was also found to have significantly lower 352 levels in infected patients in the metadataset. The decrease in reduced thiols, may arise 353 due to increased levels of oxidative stress in response to infection.

Other amino acids presenting lower levels in all three datasets in the infected group 355 but not with statistical significance include: beta-alanine, proline and betaine. In 356 contrast, several amino acids and their relatives presented overall higher levels in 357 infected patients including carnitine, tyrosine and leucine.

Carbon metabolism A significant increase in glucose was also noted across datasets 359 (Fig 6) indicating alterations in central carbon metabolism. Indeed, perturbations to the 360 glycolytic process and citric acid cycle, in particular, were confirmed in -ESI mode data 361 particularly with significant decreases in lactate, oxalate and malate.

Lipid metabolism Lipid abnormalities were also noted in the sera of infected 363 patients, which demonstrated significant changes in the fatty acid metabolism with 364 significant decreases in particular acylcarnitines (Fig 6) . Fatty acids, Nucleotide metabolism Pyrimidine metabolism is also affected with cytosine being 389 significantly higher in infected patients. Viral infections are known to cause metabolic 390 changes in host cells, including upregulation of pyrimidine nucleotide biosynthesis [28] . 391 Uracil and its derivative, uridine, however, were found to be significantly decreased in 392 all three datasets in the infected group. Purine metabolism is also affected, with Recent studies which have explored the retention time drift problem in the context 407 of large sample sizes have led to several algorithms being developed to correct the 408 problem. One study [29] presented a method for aligning the samples at a population 409 scale (N=2895 human plasma samples) by correction of the non-linear retention time 410 shift inside the raw files, in a manner similar to that developed here. In order to 411 determine the retention shift between samples, isotope labelled standards were used to 412 allow modelling of the shift to correct raw files for further peak detection and alignment. 413 Our approach also corrects raw files prior to peak detection and alignment but does not 414 require heavy isotope labelled standards. Another study in which alignment between 415 samples from a large-scale dataset (N = 1000) is addressed [30] proposed a profile-based 416 alignment algorithm which uses a graphical time warping method to correct the 417 retention times for mis-aligned features.

A few studies used endogenous reference peaks to model the retention time shift 419 between sets of samples. Li et al. [31] for example, used adjacent tandem mass 420 spectrometry information to select endogenous reference compounds to model RT drift. 421 A recently published study [32] also used internal reference compounds selected based 422 on their m/z and intensity to model the RT difference between two aligned LC-MS 423 datasets using a generalised additive model.

The advantage of aligning multiple LC-MS datasets at a retention time level over 425 comparing previously annotated results from different datasets is that both putatively 426 annotated and unannotated compounds that act in the same way or uniquely to each 427 disease can be detected. Additionally, as the overall sample size is increased, by 428 including separate but related datasets, statistical robustness of the analysis is 429 enhanced, provided assumptions on the underlying similarity in responses of disparate 430 datasets (for example separate pathogen related infections here) are robust. A common 431 limitation in many LC-MS biomarker discovery studies introduced by the small sample 432 size is thus overcome. Annotation using MS2 information is also improved, as some 433 datasets contain MS2 spectra for peaksets which are absent in other datasets. This 434 could be advantageous for datasets which have limited fragmentation data available, as 435 it improves the chance of a peakset having MS2 spectra aligned, and thus the possibility 436 for better annotation.

A limitation of our study was that the algorithm was only tested with datasets run 438 in the same laboratory, on the same LC-MS platform and at the moment it is not 439 known whether this method could be applied to metabolomics datasets run on different 440 platforms. Annotation is also a limitation, as it is for metabolomics studies in 441 general [33, 34] . It is to be noted that for some of the metabolites, the difference 442 between the control group and the infected group was larger in one of the datasets than 443 in the other datasets and in some cases this could have contributed to the statistical 444 significance of the difference in the metabolites abundance in control as opposed to 445 infected. This could be either due to the disease itself or its severity. In this case it is to 446 be noted that DM was an intervention study and the infection was controlled and less 447 severe which might explain relatively lower logFC in the dataset. However, for this 448 study only the metabolites which presented fold changes going in the same direction for 449 all datasets were presented and discussed unless specified otherwise. Using disparate 450 diseases of matched severity might identify even more common features, but our study 451 was constrained by the availability of particular datasets.

In conclusion, three LC-MS datasets obtained separately to investigate metabolic 502 changes in Zika, malaria and VL infected patients were successfully aligned using fitted 503 GP models for correcting the RT drift between them, determined by the RTs of the 504 SRM metabolites within each dataset. Following annotation and statistical annotation, 505 compounds changing in abundance in similar ways were found across the different 506 infectious diseases. Common dysregulation patterns were observed in metabolic 507 pathways related to amino acid and carboxylic acids metabolism, lipid and nucleotide 508 metabolism with kynurenine pathway from tryptophan metabolism being identified as 509 the most significantly changed pathway.

For more information, see Supporting information.

Supporting information 512 S1 File. Details regarding the sample collection for each dataset. 513 S1 

The Pathophysiological Basis and Consequences of Fever

Diversity of Infectious Aetiologies of Acute Undifferentiated Febrile Illnesses in South and Southeast Asia: A Systematic Review

New Biomarkers and Diagnostic Tools for the Management of Fever in Low-and Middle-Income Countries: An Overview of the Challenges

Serum Metabolome Changes in Adult Patients with Severe Dengue in the Critical and Recovery Phases of Dengue Infection

Elevated L-Threonine Is a Biomarker for Lassa Fever and Ebola

Discovery of Metabolic Alterations in the Serum of Patients Infected with Plasmodium Spp. by High-Resolution Metabolomics

Identification of Urine Metabolites as Biomarkers of Early Lyme Disease

Metabolomics Identifies Multiple Candidate Biomarkers to Diagnose and Stage Human African Trypanosomiasis

LC-MS Alignment in Theory and Practice: A Comprehensive Algorithmic Review

Guidelines and Considerations for the Use of System Suitability and Quality Control Samples in Mass Spectrometry Assays Applied in Untargeted

Gaussian Processes in Machine Learning

A Machine Learning-Based Algorithm for the Assessment of Clinical Metabolomic Fingerprints in Zika Virus Disease

Cellular Markers of Active Disease and Cure in Different Forms of Leishmania Infantum-Induced Disease

Mapping Immune Variation and Var Gene Switching in Naive Hosts Infected with Plasmodium Falciparum

MZmine 2: Modular Framework for Processing, Visualizing, and Analyzing Mass Spectrometry-Based Molecular Profile Data

A Gaussian Process Framework in Python

NS-kNN: A Modified k-Nearest Neighbors Approach for Imputing Metabolomics Data

Predicting Network Activity from High Throughput Metabolomics

Mass-Spec-Utils: Some Useful MS Code

Tryptophan Catabolism in Chronic Viral Infections: Handling Uninvited Guests

The Plasma [Kynurenine]/[Tryptophan] Ratio and Indoleamine 2,3-Dioxygenase: Time for Appraisal

Mechanism of Interferon-Gamma Action. Characterization of Indoleamine 2,3-Dioxygenase in Cultured Human Cells Induced by Interferon-Gamma and Evaluation of the Enzyme-Mediated Tryptophan Degradation in Its Anticellular Activity

The Interplay between Indoleamine 2,3-Dioxygenase 1 (IDO1) and Cyclooxygenase (COX)-2 In Chronic Inflammation and Cancer

Progression of Pancreatic Adenocarcinoma Is Significantly Impeded with a Combination of Vaccine and COX-2 Inhibition

Gas-Phase Reactions of Protonated Tryptophan

Metabolism and Immune Function, Supplementation and Clinical Translation

Naked-Eye Sensitive ELISA-like Assay Based on Gold-Enhanced Peroxidase-like Immunogold Activity

Visualization, Quantification, and Alignment of Spectral Drift in Population Scale Untargeted Metabolomics Data

Targeted Realignment of LC-MS Profiles by Neighbor-Wise Compound-Specific Graphical Time Warping with Misalignment Detection

An Alignment Algorithm for LC-MS-Based Metabolomics Dataset Assisted by MS/MS Information

Paired Untargeted LC-HRMS Metabolomics Feature Matching and Concatenation of Disparately Acquired Data Sets

Annotation: A Computational Solution for Streamlining Metabolomics Analysis

How Close Are We to Complete Annotation of Metabolomes? Current Opinion in Chemical Biology

Nicotinamide Augments the Anti-Inflammatory Properties of Resveratrol through PARP1 Activation

Proteomic and Metabolomic Characterization of COVID-19

COVID-19 Infection Alters Kynurenine and Fatty Acid Metabolism, Correlating with IL-6 Levels and Renal Status

Insights into Malaria Pathogenesis Gained from Host Metabolomics

Circulating Pro-and Anti-Inflammatory Metabolites and Its Potential Role in Rheumatoid Arthritis Pathogenesis

Central Effect of Taurine and Its Analogues on Fever Caused by Intravenous Leukocytic Pyrogen in the Rabbit

Pathways affected in common by infection with separate 453 fever-related pathogens 454 The algorithm was written with the intention of comparing datasets related to infection, 455 hence we sought differences between infected patients and healthy controls from all 456 three datasets using linear regression from limma and mummichog to determine the 457 biological network of activity and significantly affected pathways common to the 458 infected samples group. This study offers a route to identify commonality in the 459 metabolic profile in infected patients affected by pathogens that cause fever. At the 460 centre of this meta-dataset stands the relationship between kynurenine and tryptophan 461 related metabolites.

The currently proposed molecular basis for fever involves the following. The innate 463 immune system is activated through pathogen recognition by toll-like receptors e.g. TNF-a binding protein) [1] .

Based on the results obtained here, a possible connection between fever and the 473 kynurenine pathway could be explained by the interplay between IDO-1 and COX-2. An 474 activation of IDO-1 could lead to a decreased inhibition of COX-2 which in turn could 475 lead to an increased activation of PGE2 release. The link between the two enzymes and 476 inflammation has been previously studied [24] . Suppression of COX-2 enzyme activity 477 could also be decreased by the lower levels of nicotinamide in the infected group, as 478 nicotinamide has been shown to influence the activity of COX-2 [35] . It is worth noting 479 that serum metabolomics, as used here, detects only a faint echo of the changes that are 480 occuring in specific cell types orchestrating inflammation in local anatomical sites 481 associated with infection. Although PGE2 was not annotated in these datasets (PGE 482 being difficult to detect using the platform used), linoelaidic acid and linoleic acid were 483 putatively annotated and found to be decreased. Linoleic acid is a precursor of 484 arachidonic acid and bioactive eicosanoids including PGE2. Decreased levels of linoleic 485 acid in the infected group could corelate to its increased use in arachidonic acid and, 486 thus, PGE2 production. Recent metabolomics investigations on SARS-CoV2 infection in 487 coronavirus patients also pinpointed significant alterations in the tryptophan-kynurenine 488 pathway [36, 37] , with kynurenine levels increasing with disease severity.

Many of the other metabolites found here to have significant abundance level 490 differences between the two groups have also been noted previously for roles in 491 immunometabolism. Glutamine, for instance, plays a major role in the immune system, 492 as a key energy source for immune cells and it is used at a higher rate during catabolic 493 conditions such as sepsis or other infections [27] . Depletion of glutamine and also 494 citrulline identified in this study could also be used as indicators of disease severity as 495 suggested in [38] . Decreased oxoproline levels were also previously associated with a 496 non-infectious fever-associated disease, Rheumatoid Arthritis [39] . Taurine is another 497 amino acid which has previously been associated with inflammation and fever associated 498 with diminished levels of the inflammatory cytokines TNF-α and IL-6 [40] , and also 499 related to lower body temperature when administered intracerebroventricularly [41] .