key: cord-0268413-nhrdq8j7
authors: Huang, Yuefan; Mohanty, Vakul; Dede, Merve; Daher, May; Li, Li; Rezvani, Katayoun; Chen, Ken
title: Characterizing metabolism from bulk and single-cell RNA-seq data using METAFlux
date: 2022-05-19
journal: bioRxiv
DOI: 10.1101/2022.05.18.492580
sha: fe7fb6b1154809ab314db960f3d490d15bdc8de6
doc_id: 268413
cord_uid: nhrdq8j7

Cells often alter metabolic strategies under nutrient-deprived conditions to support their survival and growth. Characterizing metabolic reprogramming in the TME (Tumor Microenvironment) is of emerging importance in ongoing cancer research and therapy development. Recent developments in mass spectrometry (MS)-based technologies allow simultaneous characterization of metabolic features of tumor, stroma, and immune cells in the TME. However, they only measure a subset of metabolites and cannot provide in situ measurements. Computational methods such as flux balance analysis (FBA) have been developed to estimate metabolic flux from bulk RNA-seq data and have recently been extended to single-cell RNA-seq (scRNA-seq) data. However, it is unclear how reliable the results are, particularly in the context of tissue TME characterization. To investigate this question and fill the analytical gaps, we developed a computational program METAFlux (METAbolic Flux balance analysis), which extends the FBA framework to infer metabolic fluxes from either bulk or single-cell transcriptomic TME data. We benchmarked the prediction accuracy of METAFlux using the exometabolomics data generated on the NCI-60 cell lines and observed significant improvement over existing approaches. We tested METAFlux in bulk RNA-seq data obtained from various tumor types including those in the TCGA. We validated previous knowledge, e.g., lung squamous cell carcinoma (LUSC) has higher glucose uptake than lung adenocarcinoma (LUAD). We also found a novel subset of LUAD samples with unique metabolic profiles and distinct survival outcome. We further examined METAFlux on scRNA-seq data obtained from coculturing tumor cells with CAR-NK cells and observed high consistency between the predicted and the experimental (i.e., Seahorse extracellular) flux measurements. Throughout our investigation, we discovered various modes of metabolic cooperation and competition between various cell-types in TMEs, which could lead to further target discovery and development.

balance analysis (FBA), and it predicts the flux values of an entire set of reactions. Flux balance analysis is a well-established constrained optimization method to study such a complicated metabolism by analyzing the flow of metabolites [28] [29] [30] [31] . It relies on a stoichiometric representation of the metabolic model to predict the fluxes for reactions while maximizing a cellular objective, subject to constraints, such as the uptake or excretion fluxes observed from the experiment. Furthermore, various extensions of FBA have been successfully applied under different settings [32] [33] [34] [35] [36] [37] .

Many studies have focused on the integration of FBA and gene expression. They showed such integration could potentially improve the prediction of metabolic fluxes through this low-cost and straightforward fashion [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] . One possible solution to connect transcriptome with FBA is using gene expression to define the objective function in FBA. For example, Lee et al. replace the biological objective(i.e., biomass or ATP maintenance) and use the correlation between the fluxes and gene expression score as the objective function [43] . Similarly, iMAT uses gene expression to divide highly expressed and lowly expressed reactions. This method finds the flux distribution that best explains gene expression patterns. It maximizes the number of reactions classified as highly expressed and minimizes the number of reactions classified as lowly expressed [38] . Another way to incorporate gene expression into FBA is to use expression values to define flux bounds in FBA. For example, E-Flux directly uses transformed gene expression values as flux constraints since it assumes the gene expression level determines the flux upper bound [45] . Those studies have demonstrated that the addition of gene expression has improved the prediction of metabolic states [38] [39] [40] [41] [42] . However, still lacking is systematic inference of the metabolic reaction activity from mechanistic reaction flux network in human cancer. Current scRNA-seq technology allows powerful characterization of cellular heterogeneity and has unraveled metabolic vulnerability and heterogeneity in many settings [17] [18] [19] 50] . Recently, several promising efforts have been made to predict COVID-19 metabolic targets and changes using iMAT on scRNA-seq data [51, 52] . Damiani et al. has proposed single-cell Flux Balance Analysis(scFBA) to obtain single-cell fluxomics. However, usability in large scRNA-seq data was not discussed, and systematic validation in TME was not reported [53] .

We developed METAFlux that estimates flux-based metabolic pathway activity from gene expression to address these fundamental challenges. Briefly, we used Human-GEM (consisting of 13082 metabolic reactions, 8378 metabolites) as our underlying metabolic models. Then we derived MRAS (Metabolic Reaction Activity Score) from gene expression using GPR(Geneprotein-reaction) [54] as flux constraints. To account for nutrient differences between cell cultures and patients, we defined lists of nutrient availability profiles for the "cell culture medium" and the "human blood" contexts. Subsequently, we formulated quadratic programming coupled with FBA under given nutrient constraints to calculate the metabolic fluxes for the 13082 reactions. We also extended the framework to calculate single cell cluster level metabolic activity. We combined the different cluster group metabolic network compartments in TME into one network and treated heterogeneous cell clusters as one community, where we can observe different modalities of metabolic organization, such as cooperation and competition for nutrients. Our model produces a higher-resolution activity prediction and will potentially facilitate a more accurate downstream analysis. Notably, our tool has the potential to improve the understanding of aberrant metabolism and serve as a preliminary source to investigate specific metabolic targets.

Our underlying metabolic knowledge is extracted from the Human1. Human 1 is a genomescale metabolic model (GEM) that integrates the Recon, iHSA, and HMR models. It contains 13082 reactions and 8378 metabolites [55] . This model encodes the mechanistic relationships between genes, metabolites, and reactions in a human cell. We choose Human1 because it shows a considerable improvement over other GEMs in terms of stoichiometric consistency, percentages of mass, and charge-balanced reactions [55] . Human1 model contains reactions in 9 compartments (extracellular, peroxisome, mitochondria, cytosol, lysosome, endoplasmic reticulum, golgi apparatus, nucleus, inner mitochondria).

For each sample in a bulk dataset, METAFLux (Fig. 1a, Methods) first computes the metabolic reaction activity score (MRAS) for each reaction, which describes the reaction activity as a function of the associated gene expression. Here, we do not consider the reaction kinetic constants and the binding affinity of proteins since robust estimates of all these parameters for genome-scale models are rather difficult [31, 45] . Instead, we use GPR rules, which decode the Boolean logic relationship between genes in a reaction [54] , to map the relationship between gene products and then summarize gene expressions into Metabolic Reaction Activity Scores (MRAS) given the predefined relationships. Our approach is adopted from what has been proposed earlier to infer the activity of metabolic reaction from gene expression data [43, 56] .

To connect transcriptome and fluxes, a possible solution is to use the MRAS calculated before to define the flux constraints. We use a E-Flux type of approach [45] , where the expression levels of the genes associated with a metabolic reaction serves as the maximum possible flux that reaction can carry. The rationale is that, although enzyme activities do not have a high correlation with RNA levels, given a specific level of translational efficiency and assuming there is a limited accumulation of enzymes over a certain time window [45] , RNA expression levels can be used as the maximum amount of protein products available and the maximum protein products can then serve as the maximum reaction fluxes.

Subsequently, we need to define a nutrient environment profile, which includes a list of metabolites available for uptake by the reactions (Methods). In the constrained optimization step, we hypothesize that tumors proliferate rapidly; thus, the new human biomass pseudoreaction, which constructs a generic human cell's nutrient demand and composition, should be optimized [55] . Here we reformulate this idea into convex quadratic programming (QP) to overcome degenerate solutions. Our optimization simultaneously optimizes the biomass objective and minimizes the sum of fluxes' squares, similar to a previous approach [57] (Methods).

We also propose a workflow for single cell settings (Fig. 1b, Methods) . Single-cell RNAsequencing enables characterization of individual cells to unravel complexity and heterogeneity of tumor microenvironments [58] . Since we believe that cell groups do not work in isolation, modeling the whole tumor microenvironment as one community will account for metabolic interaction between the groups. Here, we do not compute cell-wise flux. Instead, we model fluxes at pseudo-bulk level for the following reasons. First, scRNA-seq data is often highly sparse and noisy. Directly estimating cell-wise MRAS from zero-inflated data can result in many zeros, which challenges downstream modeling. One possible solution seen in the Compass metabolic model, recently developed by Wanger et al., used KNN smoothing to mitigate sparsity and stochasticity [59] . Still, different imputation algorithms can generate different results, an issue beyond the scope of this study. Secondly, merging pseudo-bulk level groups into one network is more computationally efficient than working at single cell level.

To estimate fluxes from scRNA-seq data, we first create stratified bootstrap sampling datasets and compute the pseudo-bulk gene expression profiles based on customized grouping for each bootstrap sample. Next, we calculate MRAS for each pseudo-bulk bootstrap sample. To run METAFlux in single-cell setting, we need to provide cluster (group) proportions with respect to the whole TME. Ideally, these proportions should be retrieved from experiments or calculated from matched bulk data using CIBERSORTx [60] . However, most datasets do not have such information. As a result, directly observed cluster (group) fractions in single-cell data could be used for the purpose, but further studies are warranted to evaluate the findings since those proportions may deviate from the truth due to sparse sampling [61, 62] . We then estimate the group proportions and derive fluxes for each bootstrap using the merged metabolic networks of different groups in TME under constrained optimization (Methods).

We used NCI-60 RNA-seq data and publicly available metabolite CORE (consumption and release rate) data generated on NCI-60 cell lines to benchmark our model performance [63] .

However, we only selected 11 cell lines as the experimental ground truth since other cell lines had nutrient depletion that could affect the reliability of flux profiling [55, 64] . There were 26 metabolites fluxes and one biomass flux per cell line. Our model was run based on Ham's medium composition (Methods).

Meanwhile, we also compared METAFlux performance with the state of art method enzymeconstrained model(ecGEMs). These two results can be compared because they were generated based on the same medium composition and both models maximized biomass reaction.

ecGEMs is an enzyme-constrained cell line-specific model, which utilizes tINIT to generate cellspecific GEMs and then incorporates GECKO to add enzyme constraints. tINIT is an optimization algorithm that extracts context-specific and connected GEMs based on proteomics and/or transcriptomics datasets [65] . GECKO (enhancement of a Genome-scale model with Enzymatic

Constraints and Omics data) extends the flux balance analysis approach (FBA) to enable the integration of enzyme and proteomics data [66] .

The overall Spearman correlation of experimental fluxes with predicted fluxes for all the metabolites in the 11 cell lines was 0.76 for METAFlux, and 0.45 for ecGEM (Fig. 2a) . In terms of consistency across different metabolites, METAFlux generally achieved better spearman correlation with ground truth than ecGEM across metabolites (Fig. 2b) . For metabolites like Lcarnitine and pyruvate, ecGEM predicted those fluxes to be 0. Therefore, the correlation of these two metabolites to background truth was not calculated. To further examine the accuracy of our flux prediction, we categorized metabolic fluxes to 'no flux' (flux equals to 0), 'uptake' (flux smaller than 0) and 'excrete' (flux greater than 0) because correlation metrics only show whether there is a relationship between pairs of fluxes. However, it omits the sign of fluxes which represents the directional uptake or excretion behavior. After categorization, we evaluated the predicted direction accuracy for each metabolite between two models (Fig. 2c) .

Except for 'choline,' METAFlux achieved better accuracy for seven metabolites and the same accuracy with ecGEM for the other 19 metabolites. Taken together, these results suggest that METAFlux outperformed ecGEM on predicting metabolic fluxes from RNA-seq data.

To estimate the effect of medium constraints on model performance, we first did not impose medium constraints in METAFlux, meaning that we freely allowed the uptake or excretion of all exchange metabolites with no rate restriction. Without any medium constraints, METAFlux only had a spearman correlation of 0.15. Similarly, both spearman correlations across metabolites and cell lines dropped significantly (Supplementary Figure 1a, b) . The result showed that the prediction performance for models without medium constraints declined considerably compared with the original medium composition.

To further explore the medium effect on performance, we tested our model on random generated medium. We compared the results obtained from our biological meaningful medium with same-sized random mediums. The original hypothesized hams medium contains 44 metabolites, so we randomly selected 44 metabolites from the total 1648 exchange metabolites. We then allowed our model to uptake or excrete those 44 metabolites without rate restriction while only allowing the rest of 1604 metabolites to excrete with no rate restriction. We repeated this simulation process = 500 times. For each simulation , we obtained the overall spearman correlation and directionality accuracy . To calculate the p-value, we counted the number of measurements greater than our original biological meaningful statistics and and divided this number by 500. The P-values for both overall spearman accuracy and direction accuracy were zero, indicating that none of the results generated by the random medium were superior to our original biological medium (Supplementary Figure 1c, d) . These results indicate that METAFlux has achieved biologically meaningful modeling in in vitro cell-line culturing experiments and can potentially be applied in broader settings.

We applied METAFlux on TCGA LUAD (lung adenocarcinoma) and TCGA LUSC (lung squamous cell carcinoma) with human blood profile as medium constraints (Methods). A recent study utilizing TCIA (The Cancer Imaging Archive) database to quantify tumor glucose uptake using 18 F-FGD PET-CT observed a significant higher glucose uptake in LUSC than in LUAD. To validate METAFlux, we examined the glucose uptake flux estimated by METAFlux from the LUSC and the LUAD RNA-seq data. The METAFlux prediction results indicated that LUSC tumors had a higher glucose uptake than LUAD, which is consistent with the 18 F-FGD PET-CT scan results (Fig. 3a) . In addition, prior research has also suggested 18 F-FGD was closely correlated with the proliferation index [67] . Consistent with this, we found that the glucose uptake influx from METAFlux was highly correlated with estimated proliferation signature scores with a spearman correlation of 0.65 (Supplementary Figure 2a) . The proliferation scores were calculated by ssGSEA using derived gene sets and were directly available from earlier study [68] .

Clustering the metabolic fluxes revealed 2 clusters of samples for LUAD and LUSC (Fig. 3b) . The LUAD tumors in cluster 1 (LUAD1) had significantly lower glucose uptake than 'LUSC-like LUAD" (P values<2.26× 10 −16 ), implying LUSC-like LUAD tumors are more metabolically active.

Survival analysis revealed that LUAD1 tumors had significantly better survival outcomes (P value < 0.0084) than 'LUSC-like LUAD' (Fig. 3c) . Moreover, such clustering membership cannot be found when performing clustering using the corresponding gene expression data (Fig. 3d) and

Supplementary Figure 2c, d) . There was no significant difference in patient survival among the resulting clusters in LUAD tumors (Fig. 3e) . Results demonstrated that utilizing metabolic fluxes generated by METAFlux was able to discover novel tumor subtypes, which could not be found otherwise by gene expression.

In single cell settings, we first examined the scRNA-seq data generated from an in vivo coculture experiment assessing the killing of engineered CAR-NK cells on Raji cells, a non-curative CD19+ lymphoma cell-line model. Data included three products: CAR19-NK cells armed with IL-15, CAR19-NK cells lacking IL-15, and non-transduced NK cells, in addition to the tumor cells. In total, data from four time points were collected: day 7, day 14, day 21, and day28. The extracellular acidification rate (ECAR) and oxygen consumption rate (OCR) data were measured by Agilent Seahorse XFe96 Analyzer, and the measurements were taken 2 hours after cocultured NK and Raji cells. METAFlux single cell framework requires cell type proportion as an input parameters, here we simply calculated percentage for each cell type using single cell RNAseq data.

We sought to compare our METAFlux results with the Seahorse assay. We only utilized singlecell RNA-seq data from day 7 for our benchmark comparison since day 7 profile would be the closest to the state when Seahorse assay was measured. We used oxygen efflux as the surrogate for OCR. Similar to a previous study, we used proton efflux as the surrogate for ECAR [69] . We set the bootstrap number to 100 for each product. The Seahorse assay indicated that CAR19/IL15 NK cells had the highest OCR and ECAR, followed by CAR19 NK cells, and NT-NK cells showed the lowest OCR and ECAR among the three products. Our predicted OCR and ECAR findings showed a consistent trend with Seahorse Assay that CAR19/IL15 NK cells were the most metabolically active, while NT-NK cells were the least metabolically active (Fig. 4a, b and Supplementary Figure 3a-b) .

Next, we aimed to analyze the metabolic competition in CAR19/IL15 over time. Our ongoing work demonstrated tumors recurred after day 14, and NK cells experienced metabolic dysfunction and decreased anti-tumor activity over time with tumor recurrence. We set the bootstrap number for each time point to be 100 and used per cell competition score to quantify the metabolic competition between tumor and NK cells. We defined tumor and NK nutrient competition score as the ratio of the per cell nutrient uptake flux in tumor to the per cell nutrient uptake flux in NK. We found that the metabolic competition for oxygen and glucose decreased from day 7 to day 14 but ramped up after day 14 and reached its peak on day 28( Fig.   4c, d) . Moreover, several amino acids showed similar trends (Supplementary Figure 3c- 

Consistent with prior work, METAFlux results suggested tumor cells eventually outcompeted NK cells for nutrients and NK cells became less metabolically fit over time [70] .

We also observed that tumor cells were the major contributor to the total lactate production in the TME on day 21 and day 28, and there was an increasing trend of lactate production by all tumor cells over time (Fig. 4e and Supplementary Figure 3g) . On average cell level, NK cells showed an elevated lactate release over time (Supplementary Figure 3h) . As a result, the tumor microenvironment became more and more acidic, which suppressed the NK function and decreased the cytotoxic activity of NK cells, leading to tumor recurrence [71] . Our findings could be potentially relevant to understanding the associated mechanism with tumor resistance and relapse. Taken together, METAFlux can capture the longitudinal competition mechanism between NK and tumor cells and pinpoint the nutrients they compete for.

Our TCGA sample analysis demonstrated that glucose uptake was significantly higher in LUSC than in LUAD at bulk level. However, glucose uptake comparison on bulk level did not show the distinct metabolic programs for different cell types in TME. To this end, we then sought to identify metabolic heterogeneity for different cell types using METAFlux in a community-based setting.

We applied METAFlux on bulk sorted RNA profiling data from primary lung cancer patients directly acquired from the operating room [72] . The data was sorted into immune cells profiles for all four cell types. We used the cell type proportions from the original study as our input parameters and those cell type proportions were calculated by CIBERSORT using matched bulk data [72] .

We next compared the nutrient uptake between cancer, immune cell, fibroblast, and endothelial cells between SCC and ADENO. Since the sample size was small, the P value was not significant. Consistent with the findings from the bulk data, there was a trend toward higher glucose uptake from whole TME in SCC than in ADENO (P=0.058,95% CI of effect size [-1.8149, - Supplementary Figure 4a) . The cell type specific nutrient uptake indicated a weaker increase in glucose uptake from average cancer cells in SCC than in ADENO (P=0.222, 95% CI of effect size[-1.3435,0.2831]) (Fig. 5a) Fig. 5a) . However, the most striking difference came from immune cells. The immune cells in SCC had the highest per cell glucose uptake in TME, while the immune cells in ADENO had the lowest capacity to uptake glucose per cell among TME (P=0.045, 95% CI of effect size[-1.8626, -0.1579]) (Fig. 5a) .

This finding suggested that immune cells in SCC were not deprived of glucose.

In addition to per cell metabolic profile for each cell type, we were also interested in total nutrient consumption for a particular cell type. To calculate the total consumption for a certain nutrient in a particular cell type, we multiplied the per cell nutrient uptake flux by its cell type abundance. Since the tumor has the largest abundance, the nutrients secretion or uptake can be greatly skewed by its mass. Cancer cells account for 56.55% of glucose uptake in ADENO and 49.28% of glucose uptake in SCC. Immune cells account for 18.82% of glucose uptake in ADENO and 34.31% of glucose in SCC (Supplementary Figure 4b) . For glutamine, cancer cells account for 68.25% of glutamine uptake in ADENO and 62.32% of glutamine uptake in SCC. Immune cells account for 15.79% of glutamine uptake in ADENO and 19.33 % of glutamine in SCC (Supplementary Figure 4b) . METAFlux showed that heterogeneous cell types had different affinities for nutrient and revealed distinct metabolic phenotypes within TME. TME is a highly complex mixture, and TME components either form metabolic antagonism or symbiosis when uptaking nutrients [73] . When one or more cell types benefit from the metabolites produced by other cell types, we define the interaction as a metabolic cooperation program. When all cell types compete for limited resources in TME, we define it as a competition metabolic program. A competition between TME components occurs when nutrient demands in TME are high.

We captured different interaction mechanisms among TME in both ADENO and SCC (Fig. 5b-d) .

We identified the competition mode where different cell types compete for nutrients like glucose, glutamine, oxygen, and other amino acids (Fig. 5b and Supplementary Tables 1) . In contrast, we have also identified a cooperation mechanism where one or more cell types utilized the nutrients (e.g., Phenylalanine) produced by other TME components to favor their growth ( Fig. 5c and Supplementary Tables 1) . Besides cooperation and competition, we also found a 'parallel producer' modality where all TME compartments released a certain nutrient (e.g., lactate) (Fig. 5d) .

In this study, we developed METAFlux, a computational framework that opens up the application of metabolic flux analysis to transcriptomic datasets. First, we converted RNA-seq data to metabolic reaction activity score and use constrained optimization to maximize biomass under the steady-state assumption to infer fluxes. We then extended the method to a singlecell RNA-seq setting. For single cell settings, we first merged different compartments in TME into one network and calculate cluster level metabolic activity. Then, we treated heterogeneous cell clusters in TME as one community and used constrained optimization to maximize biomass for the entire community and infer metabolic fluxes.

METAFlux calculates metabolic fluxes at cluster level under single cell settings, because we observed that RNA-count dropout could affect flux prediction accuracy. Creating a pseudo-bulk sample can substantially alleviate the negative impact of dropout effect and save computational cost. For datasets with high dropout rates, it could be beneficial to applying gene expression imputation before applying METAFlux.

The metabolic growth highly relies on the medium; however, we did not distinguish different METAFlux in bulk RNA-seq setting achieved great benchmarking results using NCI 60 experimental data as ground truth. It correctly predicted metabolic phenotypes in single cell CAR-NK dataset, compared against Seahorse assays. We also successfully applied METAFlux on various patient samples for hypothesis generation. Our tool provides a cost-effective approach for discovering novel metabolic features in high-dimensional datasets. We have shown our method can generate consistent results with experimental data and identify meaningful and novel features. Furthermore, our tool allows for the connection between transcriptomic data and system-level FBA metabolic analysis and a better understanding of plastic and complex metabolic network.

In GPR, the AND operator joins the genes encoding for different subunits of the same enzyme, and the OR operator joins the genes encoding for isoenzymes [74] . For a reaction catalyzed by an enzyme complex, all the subunits need to be expressed to catalyze a reaction, and the lowest expressed unit will be the rate-limiting step for this complex. Therefore, the metabolic activity of such an enzyme complex will be the lowest expression value among all genes associated with this enzyme complex. For a reaction catalyzed by isoenzymes, all the isozymes contribute additively to this reaction [56] . Thus, metabolic activity will be the sum of all expressions of isoenzyme genes. Some genes are involved in multiple reactions (e.g., promiscuous enzyme), and we hypothesize that there may be enzyme resource competition may exist between reactions. We adjust for the enzyme promiscuity by dividing the expression value of a gene by the number of reactions the gene has participated in. A similar approach has been seen in [75] . The steps of deriving MRAS are the following:

Let be the number of reactions participate and 

We set normalized MRAS as the flux upper bound to their corresponding metabolic reactions.

The Where and are the input flux bounds for ℎ reaction in ℎ samples, and stands for reversibility of a reaction. If = 0, the reaction is non-reversible; if = 1, the reaction is reversible.

A chemical reaction is a basic unit in metabolic pathways, and stoichiometry can represent the quantitative relationship between products and reactants in a reaction. The stoichiometric coefficients of the reactions that constitute the network populate the stoichiometric matrix S.

Here 

We use the ham's medium as the growth medium in cell line models [55] . It contains 44 metabolites, and the uptake or excretion rates of these 44 metabolites are not limited, meaning that cells may uptake or excrete these metabolites without limit. For the remaining metabolites in the model, we do not allow cells to uptake from the medium, but cells can excrete those metabolites into the medium. For tissue samples from patients, it is necessary to define a more 

Traditional FBA by linear programming (LP) gives a unique optimal objective value. However, the solution to FBA from LP is most likely degenerate, meaning the solutions are not unique, and different solvers will likely return different vectors. Usually, the flux variability analysis will be used afterward to calculate the range of fluxes that achieves the optimal objective [76] .

Another approach, pFBA or Parsimonious enzyme usage FBA, was proposed earlier [32] . pFBA assumes there is a selection for an organism to minimize the total amount of necessary enzymes to achieve optimal growth. pFBA first computes the optimal growth rate and then minimizes the sum of reaction fluxes under the optimal solution. Here we reformulate this idea into convex quadratic programming (QP).

We The framework is implemented using OSQP solver[77] and formulated as the following optimization problem:

Where is a vector of zeros with a one at the position of our designated biomass reaction, and constraint (3) is the growth medium constraints. is set to 10000, the same order of magnitude with respect to fluxes used in Fit methods [57] . stands for the ℎ reaction. We define a growth medium and all exchange reactions and as exchange reactions relevant to .

Given a scRNA-seq dataset, we assign a group (cluster or cell type) label to each cell. We first perform stratified bootstrapping, which means we sampled with replacement with respect to each group.

Step 1: Bootstrap sample generation. Let group be = 1, … . , and be the ℎ bootstrap sample for ℎ group. For each bootstrap iteration , we combine 1 , 2 … for all groups to form resampled data. Each generated bootstrap data will be the same size and have the same group proportion as the original data.

Step 2: Mean calculation. For each bootstrap sample, we calculate the mean gene expression vector for each group.

Step 3: Define group fraction parameter . is a fraction matrix defined as:

stands for the percentage of cell group , and we constrain that ∑ =1 should be one.

We require group fractions as our input. Group fractions indicate the proportions of groups of interest with respect to the whole sample.

Step 4: Merging metabolic networks. Let be the metabolites associated with exchange reactions in group . To merge multiple metabolic networks, we need to create a "TME metabolite reservoir" for different cell groups to interact. We define as the reservoir metabolite in group : ↔

This representation allows different partitions of TME to share the same resources. 

Group 3

Our model aims to maximize the entire community's biomass while minimizing the sum square of overall fluxes. The framework is implemented using OSQP solver[77] and formulated as follows:

is a vector of zeros with ones at each cell group's designated biomass reaction position. is set to 10000. Constraint (3) is the growth medium constraints, and we define a growth medium , and ( ) as shared exchange reactions relevant to . stands for the ℎ reaction.

To estimate the effect of data sparsity on model performance, we plan to employ a similar strategy described in splatter to simulate dropout on 11 NCI-60 cell lines we used [78] . First, we used a logistic function ( ) to calculate the dropout probability of every gene count.

Subsequently, a Bernoulli distribution with dropout probability vector as input parameter will then be utilized to randomly replace the original count matrix with 0.

Where is the log2 normalized gene count and 0 is the midpoint value. We increase the midpoint value sequentially to increase the proportion of dropout rates. The vector of midpoint values in the simulation is 0.1, 0.5, 1, 1.5 ,2 ,2.5 ,3, 3. 5, 4, 4.5, 5. METAFlux will be applied for each simulated dataset and results obtained from simulated data will be compared with original data.

To estimate the medium effect on model performance, we will compare the results obtained from our biological meaningful medium with same-sized random mediums. The assumed medium contains 44 metabolites, so we randomly select 44 metabolites from the total 1648 exchange metabolites. We then allow our model to uptake or excrete those 44 metabolites without rate restriction while only allowing the rest of 1604 metabolites to excrete with no rate restriction. We repeat this process = 500 times. For each simulation , we obtain the overall Spearman correlation and directionality accuracy . To calculate the p-value, we simply count the number of measurements greater than our original biological meaningful statistics and and divide this number by 500.

Where (. ) Is an indicator function, equaling 1 if the condition in parenthesis is true, 0 otherwise.

Not applicable in this study.

Not applicable in this study.

All datasets used in our study are publicly available. Human-GEM model was accessed from (https://github.com/SysBioChalmers/Human-GEM). We retrieved the medium composition and flux profiling data for 11 NCI-60 cell lines under the original manuscript [55] . ecGEM flux prediction for 11 NCI-60 cell lines was be obtained at https://zenodo.org/record/3583004#.YhQJdZPMJqs NCI-60 cell lines TPM RNA-seq data was obtained from https://depmap.org/portal/download/.The TCGA pan-cancer RNA-seq TPM data was downloaded from UCSC Xena data hubs(https://xenabrowser.net/). The proliferation score data can be found in the original publication [68] . CAR-NK single-cell RNA-seq data is available through NCBI Gene Expression Omnibus (GSE190976). Patient Lung cancer bulk sorted RNA-seq data can be downloaded from the NCBI Gene Expression Omnibus (GSE111907).

Code will be available in Github.

The authors declare that they have no competing interests. Graph-based representation of 3 different metabolic modalities in ADENO and SCC. Each node represents one cell type (Immune, endothelial, tumor, or fibroblast) or system (shared TME).

The edge width represents the absolute magnitude of flux. The arrow shows the direction of flux. Arrow coming from cells to system means the nutrient of interest is released to system. Arrow coming from system to cells means the nutrient of interest is absorbed from system. (b)

Metabolic competition mode in ADENO and SCC. Immune, endothelial, tumor and fibroblast cells compete for glucose, glutamine, and oxygen uptake in TME (represented by the system). cluster + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ ++ + + + + ++ + + + ++ + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + ++ + + + + + + + + ++ + + ++ + + + + Number at risk + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + ++ + + ++ + + + + + ++ + + + + + + +++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 

Metabolic gene alterations impact the clinical aggressiveness and drug responses of 32 human cancers

Fundamentals of cancer metabolism

Targeting Metabolism for Cancer Therapy

Hallmarks of Cancer: New Dimensions. Cancer Discov

Metabolic reprogramming and cancer progression

Tumor Microenvironment, Metabolism, and Immunotherapy

Physiologic Medium Rewires Cellular Metabolism and Reveals Uric Acid as an Endogenous Inhibitor of UMP Synthase

Improving the metabolic fidelity of cancer models with a physiological cell culture medium

LC-MS-based metabolomics

Defining the metabolome: size, flux, and regulation

A Checklist for Reproducible Computational Analysis in

Unbiased metabolic flux inference through combined thermodynamic and <sup>13</sup>C flux analysis

High-content fluorescence imaging with the metabolic flux assay reveals insights into mitochondrial properties and functions

RNA sequencing: new technologies and applications in cancer research

Single-Cell RNA-Seq Technologies and Related Computational Data Analysis

Pan-cancer analysis of transcriptional metabolic dysregulation using The Cancer Genome Atlas

Metabolic landscape of the tumor microenvironment at single cell resolution

Capture at the single cell level of metabolic modules distinguishing aggressive and indolent glioblastoma cells

Single-Cell RNA Sequencing Maps Endothelial Metabolic Plasticity in Pathological Angiogenesis

A pan-cancer transcriptomic study showing tumor specific alterations in central metabolism

Metabolic characterization and metabolism-score of tumor to predict the prognosis in prostate cancer

GSVA: gene set variation analysis for microarray and RNA-seq data

SCENIC: single-cell regulatory network inference and clustering

Single sample scoring of molecular phenotypes

Targeting cancer metabolism: a therapeutic window opens

Current status and applications of genome-scale metabolic models

Applications of Genome-Scale Metabolic Models in Biotechnology and Systems Medicine

What is flux balance analysis?

Flux balance analysis of biological systems: applications and challenges

Advances in flux balance analysis

Flux balance analysis in the era of metabolomics

Omic data from evolved E. coli are consistent with computed optimal growth from genome-scale models

Dynamic flux balance analysis of diauxic growth in Escherichia coli

Flux balance analysis accounting for metabolite dilution

Flux balance analysis: a geometric perspective

Regulatory on/off minimization of metabolic flux changes after genetic perturbations

Analysis of optimality in natural and perturbed metabolic networks

iMAT: an integrative metabolic analysis tool

Reconstruction of genome-scale active metabolic networks for 69 human cell types and 16 cancer types using INIT

Context-specific metabolic networks are consistent with experiments

Fast reconstruction of compact context-specific metabolic network models

Computational reconstruction of tissue-specific metabolic models: application to human liver metabolism

Improving metabolic flux predictions using absolute gene expression data

Functional integration of a metabolic network model and expression data without arbitrary thresholding

Interpreting expression data with metabolic flux models: predicting Mycobacterium tuberculosis mycolic acid production

Probabilistic integrative modeling of genome-scale metabolic and regulatory networks in Escherichia coli and Mycobacterium tuberculosis

Genome-level transcription data of Yersinia pestis analyzed with a new metabolic constraint-based approach

Integration of gene expression data into genomescale metabolic models

Network-based prediction of human tissue-specific metabolism

Deciphering Metabolic Heterogeneity by Single-Cell Analysis

Genome-scale metabolic modeling reveals SARS-CoV-2-induced metabolic changes and antiviral targets

Integrated analysis of plasma and single immune cells uncovers metabolic changes in individuals with COVID-19

Integration of single-cell RNA-seq data into population models to characterize cancer metabolism

Stoichiometric Representation of Gene-Protein-Reaction Associations Leverages Constraint-Based Analysis from Reaction to Gene-Level Phenotype Prediction

An atlas of human metabolism

Integration of transcriptomic data and metabolic networks in cancer samples reveals highly significant prognostic power

Improving flux predictions by integrating data from multiple strains

Unraveling the Complexity of the Cancer Microenvironment With Multidimensional Genomic and Cytometric Technologies

Metabolic modeling of single Th17 cells reveals regulators of autoimmunity

Determining cell type abundance and expression from bulk tissues with digital cytometry

Classification of low quality cells from single-cell RNA-seq data

Sources of variation in cell-type RNA-Seq profiles

Metabolite profiling identifies a key role for glycine in rapid cancer cell proliferation

Metabolite Depletion Affects Flux Profiling of Cell Lines

Identification of anticancer drugs for hepatocellular carcinoma through personalized genome-scale metabolic modeling

Improving the phenotype predictions of a yeast genome-scale metabolic model by incorporating enzymatic constraints

Does Tumor FDG-PET Avidity Represent Enhanced Glycolytic Metabolism in Non-Small Cell Lung Cancer?

The Immune Landscape of Cancer. Immunity

Integrating Extracellular Flux Measurements and Genome-Scale Modeling Reveals Differences between Brown and White Adipocytes

Metabolic competition is an important driver of tumor resistance after CAR NK cell therapy and can be overcome by cytokine engineering

A human lung tumor microenvironment interactome identifies clinically relevant cell-type cross-talk

Metabolic Cooperation and Competition in the Tumor Microenvironment: Implications for Therapy

GPRuler: metabolic Gene-Protein-Reaction rules automatic reconstruction. bioRxiv

Model-based assessment of mammalian cell metabolic functionalities using omics data

An objective function exploiting suboptimal solutions in metabolic networks

Splatter: simulation of single-cell RNA sequencing data

uptake, and the inner circle shows ADENO nutrient uptake percentage. To calculate the percentage of specific nutrient uptake in a particular cell type, we divided the total uptake of that nutrient from each cell type by the total TME uptake of that nutrient. The total cell type uptake was calculated using per cell average flux multiplied by cell type proportion. Total TME nutrient uptake was calculated by summing the total uptake of that nutrient from all cell types.