key: cord-0762963-pzlw1bxx authors: Correia, Gonçalo D. S.; Takis, Panteleimon G.; Sands, Caroline J.; Kowalka, Anna M.; Tan, Tricia; Turtle, Lance; Ho, Antonia; Semple, Malcolm G.; Openshaw, Peter J. M.; Baillie, J. Kenneth; Takáts, Zoltán; Lewis, Matthew R. title: (1)H NMR Signals from Urine Excreted Protein Are a Source of Bias in Probabilistic Quotient Normalization date: 2022-05-03 journal: Anal Chem DOI: 10.1021/acs.analchem.2c00466 sha: 810bfe7f7ab587053ec89df66d567611f678bb8c doc_id: 762963 cord_uid: pzlw1bxx [Image: see text] Normalization to account for variation in urinary dilution is crucial for interpretation of urine metabolic profiles. Probabilistic quotient normalization (PQN) is used routinely in metabolomics but is sensitive to systematic variation shared across a large proportion of the spectral profile (>50%). Where (1)H nuclear magnetic resonance (NMR) spectroscopy is employed, the presence of urinary protein can elevate the spectral baseline and substantially impact the resulting profile. Using (1)H NMR profile measurements of spot urine samples collected from hospitalized COVID-19 patients in the ISARIC 4C study, we determined that PQN coefficients are significantly correlated with observed protein levels (r(2) = 0.423, p < 2.2 × 10(–16)). This correlation was significantly reduced (r(2) = 0.163, p < 2.2 × 10(–16)) when using a computational method for suppression of macromolecular signals known as small molecule enhancement spectroscopy (SMolESY) for proteinic baseline removal prior to PQN. These results highlight proteinuria as a common yet overlooked source of bias in (1)H NMR metabolic profiling studies which can be effectively mitigated using SMolESY or other macromolecular signal suppression methods before estimation of normalization coefficients. The general procedure of NMR samples preparation is described in detail in Dona assessment a pooled QC sample was generated by combining equal parts of each study sample, pooled QC samples prepared as above and spectra acquired regularly throughout the sample analysis. SMolESY processing of the standard 1D 1 H NMR spectra was performed using the SMolESY platform (https://github.com/pantakis/SMolESY_platform) 2 by employing the fid and the processed dispersive spectral part of the 1D 1 H NMR spectra. Protein broad signals were suppressed by SMolESY and sharp signals of small molecules were instantly deconvoluted from both overlapping signals of other metabolites and protein baseline ( Figure S1 ). To quantify the total protein via the urine 1 H NMR spectra, the recently published method of Vuckovic et al. 3 was followed. In particular, the modified 'reverse' SMolESY for each spectrum was subtracted from its corresponding standard 1D 1 H NMR spectrum, that provides the S6 elimination of all sharp signals from small metabolites, resulting into the 1 H NMR spectrum baseline. The spectral region of 0.2-0.5 ppm (i.e. representing part of the methyl total protein protons) was integrated and ERETIC signal was used for absolute quantification after applying the proposed calibration factors of the above-mentioned published research. Figure S3 shows the application of the described method in one urine spectrum of our study. Figure S4 ). Therefore, all usages of PQN normalisation described in the main text omitted this step. The median spectrum of each dataset was selected as the normalisation reference. We have investigated the relationship between urinary creatinine and protein excretion. Division by urinary creatinine concentration is a "gold standard" for spot urine normalisation in clinical assays. We observed a small correlation (ρ = 0.25, Figure S5 ) between urinary creatinine and protein, without any discernible trend. Figure S5 . Correlation between urinary creatinine concentration and protein excretion. Creatinine and total protein concentrations were square root and log-transformed, respectively. The Pearson correlation coefficient (ρ) and the p-value (p) from the two-sided significance test of the correlation coefficient are reported. Protein values equal or below the LOD = 0.11 mg/ml were excluded (final n=810). iii) Comparison between CPMG and SMolESY estimated PQN coefficients in human plasma heparin 1 H NMR spectra. As further evidence that protein baseline signals impact upon the estimation of PQN coefficients, and to compare the performance of SMolESY with an experimental baseline removal procedure, the CPMG pulse sequence, we investigated the relationship between quantified total protein and estimated PQN coefficients in a set (n=322) of 1 H NMR spectra from human heparin plasma samples. Figure S6a shows that in a standard 1D pulse sequence S9 the estimated coefficients have a very high correlation (ρ = 0.9, S6a) with protein concentration. This correlation is greatly reduced in both the CPMG (ρ = 0.17, S6b) and in the SMolESY processed spectra (ρ = 0.19, S6c). CPMG and SMolESY derived PQN coefficients are better correlated (ρ = 0.71, S6f) than with those estimated from the standard 1D spectra (ρ ≤ 0.4, S6d and e). Figure S6 . Correlation between total protein and estimated PQN coefficients from standard 1D, CPMG and SMolESY processed spectra in a set 322 human plasma heparin samples. PQN coefficients were estimated from the standard 1D noesy presat (a), CPMG (b), and SMolESY processed spectra (c) and correlated with total plasma protein. The agreement between each set of PQN coefficients is also shown in scatterplots d-f. Total protein concentrations in d-f were log-transformed. The Pearson correlation coefficient (ρ) and the p-value (p) from the two-sided significance test of the correlation coefficient are reported. The linear regression trendline (dashed red line) was estimated with ordinary least-squares in a-c and with the orthogonal least squares Passing-Bablok method in d-f. The human plasma heparin samples used in these analyses were a random subset (n=322) of S10 in Humans in the Community (GRAPHIC) study dataset. The GRAPHIC study was approved by the Leicestershire Research Ethics Committee (6463) and all subjects provided written informed consent. The 1 H NMR spectra in this dataset were acquired following the protocols for human blood products NMR profiling described in Dona et al 1 . Precision High-Throughput Proton NMR Spectroscopy of Human Urine, Serum, and Plasma for Large-Scale Metabolic Phenotyping An Efficient and Quantitative Alternative to on-Instrument Macromolecular 1H-NMR Signal Suppression 1H Nuclear Magnetic Resonance Spectroscopy-Based Methods for the Quantification of Proteins in Urine Probabilistic Quotient Normalization as Robust Method to Account for Dilution of Complex Biological Mixtures. Application In1H NMR Metabonomics