key: cord-0429765-np8xhx00 authors: Gegner, Hagen M.; Naake, Thomas; Dugourd, Aurélien; Müller, Torsten; Czernilofsky, Felix; Kliewer, Georg; Jäger, Evelyn; Helm, Barbara; Kunze-Rohrbach, Nina; Klingmüller, Ursula; Hopf, Carsten; Müller-Tidow, Carsten; Dietrich, Sascha; Saez-Rodriguez, Julio; Huber, Wolfgang; Hell, Rüdiger; Poschet, Gernot; Krijgsveld, Jeroen title: Pre-analytical processing of plasma and serum samples for combined proteome and metabolome analysis date: 2022-04-27 journal: bioRxiv DOI: 10.1101/2022.04.26.489520 sha: 7faf3094a70699c683e09fba0b6b7360f6e765ea doc_id: 429765 cord_uid: np8xhx00 Metabolomic and proteomic analyses of human plasma and serum samples harbour the power to advance our understanding of disease biology. Pre-analytical factors may contribute to variability and bias in the detection of analytes, especially when multiple labs are involved, caused by sample handling, processing time, and differing operating procedures. To better understand the impact of pre-analytical factors that are relevant to implement a unified proteomic and metabolomic approach in a clinical setting, we assessed the influence of temperature, sitting times, and centrifugation speed on the plasma and serum metabolomes and proteomes from six healthy volunteers. We used targeted metabolic profiling (497 metabolites) and data-independent acquisition (DIA) proteomics (572 proteins) on the same samples generated with well-defined pre-analytical conditions to evaluate criteria for pre-analytical SOPs for plasma and serum samples. Time and temperature showed the strongest influence on the integrity of plasma and serum proteome and metabolome. While rapid handling and low temperatures (4°C) are imperative for metabolic profiling, the analysed proteome showed variability when exposed to temperatures of 4°C for more than 2 hours, highlighting the need for compromises in a combined analysis. We formalised a quality control scoring system to objectively rate sample stability and tested this score using external data sets from other pre-analytical studies. Stringent and harmonised standard operating procedures (SOPs) are required for pre-analytical sample handling when combining proteomics and metabolomics of clinical samples to yield robust and interpretable data on a longitudinal scale and across different clinics. To ensure an adequate level of practicability in a clinical routine for metabolomics and proteomics studies we suggest to keep blood samples up to 2 hours on ice (4°C) prior to snap-freezing as a compromise between stability and operability. Finally, we provide the methodology as an open source R package allowing the systematic scoring of proteomics and metabolomics datasets to assess the stability of plasma and serum samples. Mass spectrometry-based metabolomics and proteomics are emerging technologies that are increasingly employed in laboratory and clinical settings to refine our understanding of disease biology, vulnerabilities, and resistance mechanisms. Liquid biopsies, such as blood, provide the opportunity to collect information on a patient's metabolome and proteome status on a longitudinal scale to track disease progression or response to a treatment (Tsonaka et al., 2020; Gummesson et al., 2021) . For instance, longitudinal metabolomic profiling of plasma collected from patients suffering from COVID-19 was linked to disease progression, including a panel of metabolites collected at the onset of the disease that may predict the disease severity (Sindelar et al., 2021) . Similarly, proteomic analysis of COVID-19 patients revealed protein signatures associated with survival, tissue-specific inflammation, and disease severity (Filbin et al., 2021) . The independent analysis of such complex diseases yields promising findings, highlighting that the present technologies are not the limiting factors for the broader use of mass spectrometry (MS) in clinical workflows. MS-based technologies have matured over the past years, allowing the investigation of analytically challenging but highly informative samples such as blood plasma and serum. Technical advances comprise of but are not limited to: i) Increased reproducibility and automation in sample preparation, ii) faster, more sensitive, and robust MS instruments, and iii) improved data analysis algorithms, multi-omics and integrative workflows. While these developments reduce technical noise in the data sets and improve the detection of true biological variability, their efficacy may be compromised if the quality of the starting material is not strictly controlled and standardised. metabolomic analysis on the same samples, to harmonise the requirements for both techniques with such a comprehensive set of features (Cao et al., 2019) . Critically, sample collection and handling requirements differ between metabolomics and proteomics and need to be adjusted accordingly for a combined clinical SOP. Here, we assess how pre-analytical factors impact on metabolite and protein levels in plasma and serum samples caused by differences in sitting time, temperature regimes (4°C room temperature (RT), only RT for serum), and centrifugal acceleration levels. Using targeted metabolic profiling and a single-shot, data-independent acquisition (DIA) proteomics approach, we determine that keeping blood samples on ice (4°C) for up to 2 hours prior to snap-freezing are the optimal conditions to preserve metabolites and proteins for a combined metabolomics/proteomics workflow. We introduce an open-source scoring system to assess the quality of plasma and serum samples ( Figure 1B ). To assess how sample handling and treatment affects the stability of protein and metabolite levels in human plasma and serum samples, we selected four time points between centrifugation and snap-freezing to quench samples, as follows: 0 h as the baseline of metabolite and protein levels immediately after sampling, 2 h as the clinically feasible time point to quench samples, 4 h as a middle point, and 8 h (quenching at the end of a typical working day) ( Figure 1A) . Furthermore, samples were kept at different temperatures during these sitting times (on ice/4°C and at RT) to investigate their influence on altering metabolome and proteome composition. Additionally, we included two centrifugation schemes for plasma samples (2000 g and 4000 g). However, we could not attribute any significant changes in the plasma metabolome and proteome between different centrifugation conditions, and therefore only applied 2000 g in the following sections. Identifying metabolites and proteins affected by temperature and sitting time Analysis of plasma and serum samples by targeted metabolic profiling and a single-shot, dataindependent acquisition (DIA) proteomics yielded quantitative information for in total 497 metabolites and 572 proteins. An initial LIMMA analysis (Table 1) showed a high number of features that differed between individual blood donors (α < 0.05, after FDR correction using the Benjamini-Hochberg, BH, method). In addition, PCA (Supplementary Figure 1A) of the proteomic and metabolomics data sets indicated individual-specific effects, a finding that was further supported by t-SNE and UMAP analyses (Supplementary Figure 2) and multi-omics factor analysis on both data modalities (MOFA, Supplementary Figure 3) . Especially for the metabolomics data set there was a clear separation between individuals, while for the proteomics data set we found less pronounced effects (Supplementary Figure 2D ). This analysis suggests that there are dominating individual effects that are reflected in the metabolomics data set and to a lesser extent in the proteome. To gain further insight into this, we next performed classification by Partial least square -discriminant analysis (PLS-DA) and sparse PLS-DA (sPLS-DA) to discriminate the individuals based on the metabolite and proteomics profiles. Indeed, it was possible to classify the individuals using the metabolite and protein levels with a low classification error (Supplementary Figure 1C) , suggesting that there are features in the metabolomics and proteomics data where individual-specific effects are prevalent. The metabolites and proteins in Supplementary Table 1 were selected by sPLS-DA to explain the variance using the individual as the class vector (Supplementary Figure 1B) . We also performed PLS-DA to discriminate for time and the combination of time and In a next step, we looked into the changes of metabolite and protein levels when considering inter-individual differences. Motivated by the results of the previous analyses (initial LIMMA analysis, dimension reduction analysis, PLS-DA, MOFA), showing that metabolome and proteome variation is influenced by inter-individual differences, we decided to use mixed linear models to determine the features that will change according to sitting time, temperature or a combination of time and temperature. We modelled as fixed effects time, temperature, and the interaction term time/temperature (plasma) and time (serum). The information on the blood donor (individual) was included for both groups as a random effect ( Figure 2A ). An overview of the absolute change of the significant metabolites and proteins can be found in Figure 2B (see also Figure 2C for exemplary metabolites and proteins). We provide the metabolite-and protein-associated p-values for plasma and serum samples in the Supplementary Information. Looking at metabolomics-and proteomics-specific differences, the analysis revealed that metabolite concentrations were less stable at RT, while protein abundances were less stable at 4°C ( Scoring plasma and serum sample quality using proteomic and metabolomic signatures We next investigated whether patterns of potential protein and metabolite deregulation (with respect to time and temperature) could be used as a quality metric for samples obtained under the tested conditions. We selected the top 20 proteins and metabolites ranked by p-value to generate signatures of the following handling conditions: plasma kept on ice (4°C) or RT for 8 hours, and serum at RT for 8 hours (Supplementary Figure 6 ). While it may be difficult to draw conclusions regarding the significance of individual metabolites or proteins due to limited sample size, their combined signal may hold enough information to score the relative quality of samples with respect to sitting time and temperature. Thus, to confirm that these signatures could yield sensible insight into sample integrity, we computed an average normalised enrichment score (NES, see Methods) for each signature in the respective sample preanalytical conditions. If the signatures are indeed informative, we should expect to observe a steady increase of enrichment of the respective signature for each condition. Coherently, each signature showed higher scores for samples that matched the respective conditions from which they were derived ( Figure 3A ). This showed notably that the plasma protein signature at RT over time already scored highly in samples stored at RT for 4 h, as opposed to the signature of plasma/4°C/8 h which only scored high in the samples obtained at low temperature and after 8 h, as expected. This pattern was inverted for metabolic signatures of plasma samples. This indicates that the changes are more pronounced at the metabolomic level when samples are stored at 4°C compared to RT, while changes are more pronounced at the proteomic level for plasma samples kept at RT. Finally, while the NES can take both positive and negative values, here, we focused only on the positive values to simplify the interpretation of the results. Since the data was normalised in a way that each measurement is scaled relative to other samples of the cohort, the scores will be drawn from a distribution where a NES of 0 represents samples that have average levels of degradation compared to the overall cohort, and any value above that represent samples that show higher degradation compared to the rest of the cohort. It is worth noting that this scoring can only score samples in a relative manner to the rest of the cohort, and cannot provide absolute quantification of sample degradation. To validate this approach, we used the signatures to score metabolomic results from an external study (Heiling et al., 2021) , where plasma and serum samples were kept at RT for 2 h ( Figure 3B ). The plasma/RT/8 h metabolic signatures got a higher score (NES = 5.1) than plasma/4°C/8 h (NES = 2.6) and serum/RT/8 h (NES = 2.9) signatures. However, the serum/RT/8 h signature score was similar to the plasma/RT/8 h signature score (NES = 6.1 and NES = 6.4, respectively). Thus, the serum RT/8 h and plasma/RT/8 h metabolic signature appears less discriminant than the plasma/4°C/8 h metabolic signature. Nevertheless, the best scores overall matched with the actual experimental conditions that were used, indicating that the scoring system holds beyond the data set that was used for training. In order to further characterise the changes that we observed in the plasma and serum samples over time, we investigated if proteomic signatures could be associated with contamination by proteins originating from specific blood cells. We obtained proteomic markers of coagulation, erythrocyte, and platelet contamination from (Geyer et al., 2019) Plasma samples kept on ice (4°C) for 4 h and 8 h showed the highest enrichment of erythrocyte contamination markers ( Figure 3C ) mainly driven by CAT, CA2, BLVRB, PRDX2, and ALDOA (Supplementary Figure 6A, B) . Interestingly, the plasma/4°C/8h seems to be also specifically driven by a lower abundance of the VWF protein, a blood glycoprotein involved in platelet adhesion (Supplementary Figure 6C) . The contamination scores were lower in plasma samples that were kept at RT, although they still showed a progressive increase over time. On the other hand, serum samples exhibited no significant increase in erythrocyte contamination score, instead showing a consistently high (albeit slightly decreasing) score for coagulation markers over time, as expected. This signature was mainly driven by increased (Heiling et al., 2021) . C: Coagulation, erythrocyte, and platelet signatures were used to score contamination of proteomic plasma and serum samples. Signatures were taken from (Geyer et al., 2019) . The progress of MS-based technologies over the past years has enabled the characterization and quantification of analytically challenging but clinically accessible samples such as blood plasma and serum. Although SOPs for individual metabolic and proteomic analyses have been developed (Pasella et al., 2013; Yin, Lehmann and Xu, 2015; Tuck, Turgeon and Brenner, 2019; Lippi et al., 2020) , there is no consensus on their combined application for the molecular characterization of blood samples in a clinical setting. Here, we performed a comprehensive analysis on the stability of 497 and 572 metabolites and proteins in blood plasma and serum to scrutinise the effects of various treatment regimes Temperature and time are well known to affect metabolite and protein levels (Kamlage et al., 2014; Cao et al., 2019; Daniels et al., 2019; Stevens et al., 2019) . Elevated levels of hypoxanthine and amino acids over time (Ferreira et al., 2019) and the deregulation of cholesterol metabolism were also previously documented (Ryu et al., 2016) . Association of these and other metabolites to a pathological condition, therefore, needs to be evaluated with caution, to exclude that they emerge inadvertently by sample handling or technical bias. Strict pre-analytical measures help to gain confidence in the subsequent biological and clinical interpretation based on the measured features. For most of the features in our data sets, we only found minor changes under the experimental conditions applied (Figure 2 ). There, 90% of metabolites and 97% of proteins only varied slightly over time. Other metabolome studies reported similar proportions where 91% of the metabolite remained stable over several pre-analytical conditions (Ferreira et al., 2019) . This implies that in clinical research studies where large effect sizes from biological differences are known or expected and large cohorts were used, the contribution to feature level variation stemming from sample handling might be partially alleviated. The integration of several data sets, i.e., proteomic and metabolomic, from the same sample may also mitigate bias. Although the increased stability of the metabolome at 4°C was expected, we observed a contrary effect for proteins, showing higher variance at low temperature (Figure 2 and 3A), suggesting that proteomics and metabolomics require different pre-analytical conditions to obtain optimal results. Therefore, we propose that keeping plasma and serum samples on ice for up to 2 h is an acceptable trade-off to maintain adequate stability of both the proteome and metabolome (Figure 2 and Supplementary Figure 5 ). In addition, this should be a condition that can be conformed to in clinical practice ( Figure 4B ). Similarly, considerations may be made regarding the storage time of samples in biological repositories such as biobanks. Previous studies showed that long-term storage at -80°C over seven years only introduces minimal variation, and that significant changes occur upon longer storage times (Wagner-Golbs et al., 2019) . This highlights the potential to address clinical questions using metabolomics and proteomics from biological repositories under the prerequisite that the sampling collection is comparable. While biobank samples are an essential resource for discovery studies, prospective samples enable the enforcement of SOPs during collection that are more suitable for metabolomic analyses, e.g., by storing samples at 4°C for under 2 hours and then quench by snap-freezing in liquid nitrogen. Quality control signatures to score plasma and serum samples Designing formal criteria for data curation and analysis is crucial to ensure data robustness. To this end, we devised a scoring system using the significantly altered proteins and metabolites as signatures to evaluate the impact of pre-analytical conditions on proteome and metabolome integrity of a given sample. Provided as an R package (https://github.com/saezlab/plasmaContamination), this tool can be used for quality control after pre-analytical handling, and in addition, the proteome signatures enable to distinguish the severity and the source of contamination, i.e. from platelets, erythrocytes or resulting from coagulation ( Figure 3C and 4C) . We showed that the changes in protein abundance in samples stored on ice were mainly related to protein markers of erythrocytes in plasma samples, likely resulting from hemolysis occurring under this condition ( Figure 4C ). As expected, coagulation signatures scored exclusively high in serum samples. Both the scores for metabolite and protein contamination enable the quality assessment of plasma and serum samples of unknown origin. Of note, the erythrocyte, platelet, and coagulation signatures were obtained from a large external cohort of samples (>70 samples), while the signatures derived from our own samples were estimated from a comparatively small number of samples (n=6). Although this may affect their discriminative power, the derived signatures and bioinformatic tools are publicly accessible and can therefore be updated and expanded easily when more data become available. Still, those signatures yielded coherent scores when they were tested with our own samples and were validated with samples from an external study. The expansion of such signatures towards other pre-analytical factors, such as storage conditions, enables the development of further quality control metrics. This may be achieved through similarly structured experimental set-ups with small sample sizes or the analysis of bigger cohorts with the inclusion of metadata. We anticipate that a quality score for proteome and metabolome integrity can have great practical utility, enabling the exclusion of low-scoring samples for further analysis. This will be particularly important if clinical decisions are to be made based on metabolic or proteomic data from such samples. At this point, it is premature to suggest a cut-off score here, since the number of samples in our study is low, and since the choice for such a cut-off may depend on the setting of the analysis (e.g., biomarker discovery, clinical decision). Finally, although a quality score is helpful, it cannot replace rigorous SOPs. In addition, this must be evaluated in the context of other available metadata that should be applied in combination with other quality control strategies (Naake and Huber, 2022) . In this study we assessed the influence of controllable pre-analytical parameters on protein and metabolite levels in plasma and serum samples, to define or improve SOPs for concerted metabolomic and proteomic analyses. While only a subset of metabolites and proteins changed, the ability to identify features that are prone to alteration increases the confidence in such broadly acquired data sets. We propose to store blood samples for maximum 2 h on ice (4°C) before quenching the samples, as a compromise between stability and practical operability. Additionally, the metabolomic and proteomic signatures can be applied routinely in bioinformatics workflows to review and evaluate the sample quality of plasma and serum samples. Due to its accessibility, such signatures may be expanded over time to improve the assessment of qualitative differences between blood samples. Lastly, bigger sample sizes and additional metadata of volunteers and/or available metadata from clinics may extend these scores to include signatures capturing other sources of variability important to clinical studies, such as storage, medication, or lifestyle. Peripheral blood samples were collected from 6 male healthy volunteers ( Data was recorded using the Analyst software suite (version 1.7.2, Sciex, Germany) and transferred to the MetIDQ software (version Oxygen-DB110-3005, Biocrates, Austria) which was used for further data processing, i.e., technical validation, quantification and data export. Low-abundant metabolites that were not measured in more than 66% of the samples as 10times the levels over the limit of detection (LOD) or above the lower limit of quantification (LLOQ) (both according to the MetIDQ software) were removed from the subsequent analysis. Using the MatrixQCvis package (Naake and Huber, 2021 reports funding from GSK and Sanofi and fees from Travere Therapeutics and Astex Multi-Omics Factor Analysis-a framework for unsupervised integration of multi-omics data sets An Integrated Analysis of Metabolites, Peptides, and Inflammation Biomarkers for Assessment of Preanalytical Variability of Human Plasma Stability of the Human Plasma Proteome to Pre-analytical Variability as Assessed by an Aptamer-Based Approach The effect of pre-analytical conditions on blood metabolomics in epidemiological studies Longitudinal proteomic analysis of severe COVID-19 reveals survival-associated signatures, tissue-specific cell death, and cell-cell interactions Plasma Proteome Profiling to detect and avoid sample-related biases in biomarker studies Longitudinal plasma protein profiling of newly diagnosed type 2 diabetes Evaluating the effects of preanalytical variables on the stability of the human plasma proteome', Analytical Biochemistry Metabolite ratios as quality indicators for pre-analytical variation in serum and edta plasma Quality markers addressing preanalytical variations of blood and plasma processing identified by broad and targeted metabolite profiling PREDICT: A checklist for preventing preanalytical diagnostic errors in clinical trials Automated sample preparation with SP 3 for low-input clinical proteomics MatrixQCvis: shiny-based interactive data quality exploration for omics data Pre-analytical stability of the plasma proteomes based on the storage temperature Hypoxanthine induces cholesterol accumulation and incites atherosclerosis in apolipoprotein E-deficient mice and cells Longitudinal metabolomics of human plasma reveals prognostic markers of COVID-19 disease severity Pre-analytical factors that affect metabolite stability in human urine, plasma, and serum: A review Longitudinal metabolomic analysis of plasma enables modeling disease progression in Duchenne muscular dystrophy mouse models