key: cord-1007838-7knzjnyr authors: Hu, Fang; Chen, Si‐liang; Dai, Yu‐jun; Wang, Yun; Qin, Zhe‐yuan; Li, Huan; Shu, Ling‐ling; Li, Jin‐yuan; Huang, Han‐ying; Liang, Yang title: Identification of a metabolic gene panel to predict the prognosis of myelodysplastic syndrome date: 2020-04-26 journal: J Cell Mol Med DOI: 10.1111/jcmm.15283 sha: a61cb00d1a3c3ea0c11478e6b0ec41e49a76169f doc_id: 1007838 cord_uid: 7knzjnyr Myelodysplastic syndrome (MDS) is clonal disease featured by ineffective haematopoiesis and potential progression into acute myeloid leukaemia (AML). At present, the risk stratification and prognosis of MDS need to be further optimized. A prognostic model was constructed by the least absolute shrinkage and selection operator (LASSO) regression analysis for MDS patients based on the identified metabolic gene panel in training cohort, followed by external validation in an independent cohort. The patients with lower risk had better prognosis than patients with higher risk. The constructed model was verified as an independent prognostic factor for MDS patients with hazard ratios of 3.721 (1.814‐7.630) and 2.047 (1.013‐4.138) in the training cohort and validation cohort, respectively. The AUC of 3‐year overall survival was 0.846 and 0.743 in the training cohort and validation cohort, respectively. The high‐risk score was significantly related to other clinical prognostic characteristics, including higher bone marrow blast cells and lower absolute neutrophil count. Moreover, gene set enrichment analyses (GSEA) showed several significantly enriched pathways, with potential indication of the pathogenesis. In this study, we identified a novel stable metabolic panel, which might not only reveal the dysregulated metabolic microenvironment, but can be used to predict the prognosis of MDS. pro-inflammatory of bone marrow microenvironment and so on. 3, 4 At present, the MDS Revised International Prognostic Scoring System (IPSS-R) is one of the gold standards for risk stratification and prognostic assessment in MDS patients, in which, patients are categorized into five well-defined risk groups according to platelet count, haemoglobin levels, absolute neutrophil count (ANC), marrow blast percentage and cytogenetics. 5, 6 Although patients in intermediate-risk group are reported to have an intermediary survival, it is possible that the disease course might vary, with variable outcome actually. 7 In the meantime, MDS lacks a diversified prognostic classification system at present. Therefore, identification of more diversified prognostic models would better guide therapeutic decisions, further assisting to design more perfect clinical trials. Furthermore, MDS is a stem cell-derived disorder affecting multiple lineages. 8 MDS stem cells with CD123 + have been reported to have higher levels of protein synthesis and change cellular energy metabolism, 9 which are similar with AML. 10, 11 The anti-leukaemia mechanism of B cell lymphoma 2 (BCL-2) inhibitor (venetoclax) combined with demethylated drugs (azacytidine) is the eradication of LSCs by disrupting the tricarboxylic acid (TCA) cycle for further and durable remissions for older AML patients. 12 Moreover, isocitrate dehydrogenase 2 (IDH2) enzyme inhibitor has been approved by US Food and Drug Administration (FDA) in 2017 for refractory or relapsed AML patients by targeting tumour energy metabolism for. BM microenvironment is vitally involved in the pathogenesis of MDS according to the 'seed soil' theory, which consists of cellular compo- non-cellular components (metabolites, cytokines, hormones and angiogenic factors). 13 Leukaemia cells use oxidative phosphorylation for survival, while HSCs depend on glycolysis for energy production. 12 Leukaemia cells are likely to uptake mitochondria from stromal cells by endocytosis. 14 As a consequence, metabolism plays key roles for non-cellular components. Accumulative studies have revealed that the relationship between pathogenesis, treatment and metabolism of MDS recently. Therefore, we established a prognostic panel of metabolic gene by downloading data from Gene Expression Omnibus (GEO) datasets in the training cohort, which was further validated in an independent external cohort. In conclusion, we constructed a metabolic panel to predict the prognosis of MDS and revealed that metabolism played significant roles in the prognosis of MDS. The mRNA expression profiles and relevant clinical information were downloaded from GSE58831 15 and GSE11 4922 16 datasets from the GEO database. The metabolic gene sets utilized as the candidate metabolic gene lists were retrieved from 'c2.cp.kegg. v7.0.symbols' in gene set enrichment analysis (GSEA). In addition, perl scripts were used to retrieve metabolic genes for further analysis. Transcripts per million normalization and log2 transformation were performed on the expression profiles. DE analysis was conducted on 861 annotated metabolic-related genes with protein coding functions by the Limma. 17 The expression pattern of metabolic genes was examined in training cohort. Genes were subjected to prognostic analysis in the case of consistent expression pattern in training cohort and independent external cohort. GSE53381 dataset was used as the training cohort to construct metabolic risk panel. The LASSO regression penalizes the data fitting criteria in a way that eliminates less informative predictor variables to yield simpler and more interpretable models. Therefore, the metabolic panel was constructed according to the penalized maximum likelihood estimator with 1000-fold cross-validation. The least criteria of the penalized maximum likelihood estimator were employed to determine the optimal values of penalty parameter λ. In addition, GSE11 4922 dataset served as an independent external validation cohort. The unified formula determined in the training cohort was used to generate the metabolic risk score in every patient, who were further categorized into high-and lowrisk groups according to the median metabolic risk score. Univariate and multivariate forwarding stepwise Cox regression analyses were conducted in both training and validation cohorts. A P < .05 indicated statistical significance. GSEA v4.0.2 software (http://softw are.broad insti tute.org/gsea/ login.jsp) was utilized to recognize the potential biological pathways between high-and low-risk groups by using 'c2.cp.kegg.v7.0.symbols' gene sets. NOM P-value < .05 indicated statistical significance and was further exhibited. Time-dependent receiver operating characteristic (ROC) curve was performed to assess the predictive performance of metabolic signature in the raining and validating cohorts, followed by calculation of area under the curve (AUC) using survival ROC package. Overall survival (OS) was defined as the primary outcome, which was calculated as the date of the study entry until death due to all causes. Kaplan-Meier curve was plotted by 'survival' package, followed by comparison by log-rank test. Univariable and multivariable Cox analyses were used to evaluate the prognostic performance of clinical and genetic features. Categorical variables were compared by chi-square test or Fisher's exact test. SPSS ® version 24.0 (IBM) and R software (version 3.6.0) were used for statistical analysis. A two sided P < .05 indicated statistical significance. The detailed patient characteristics of the two included cohorts and the correlation between clinicopathological features and metabolic risk level in training cohort and external validation cohort in MDS CKMT2 FHIT PIK3CA DGKD ENTPD4 ADA CBR3 DGKH HMGCS1 PDE4D PFKFB3 MTMR6 PDE4B CHDH CYP2A6 IPMK RRM2B AGPS PFKFB2 MGST1 POLR1D GLUD1 CES2 DGKZ NME6 NT5C CHST12 GALT ASS1 CEL ACP6 SYNJ2 DGAT2 OXCT2 POLR3F ALDH7A1 ADK TYMS LPCAT4 POLR1A MCEE AFMID EPRS ALDH4A1 SORD MPST GCDH SRM TPI1 DNMT3B PPAT GMPS MTR GSTM3 NT5M DGAT1 CANT1 CS GYS1 INPP5E POLR2A AGPAT3 HK1 PPOX TSTA3 GMPR PIP4K2A MBOAT2 PIP5K1B PTGDS ACADS GCLC GCLM ACO1 GSR TPMT Among the 861 metabolic genes subjected to DE analysis by the Limma, 140 genes were differently expressed between healthy sample and MDS sample ( Figure 1A,B) . Further, the prognostic values of these 140 genes were analysed via Univariate Cox regression analysis. Ultimately, 22 genes that were differentially expressed as well as survival-related (P < .05) were identified. ( Figure 1C) . Afterwards, the Lasso-penalized Cox analysis regression was used to select the most useful predictive genes from the 22 genes. A penalized maximum likelihood estimator was performed with 1000 bootstrap replicates. The regularization parameter lambda was used to identify the optimal weighting coefficients via the least criteria (Figure 2A,B) . Afterwards, 15 metabolic genes were selected and the coefficient was estimated to construct the The In total, 201 patients with complete clinical data including age, gender, WHO category, karyotype, IPSS, transfusion dependent, haemoglobin, bone marrow blasts cells, platelet count and absolute neutrophil count were included in the training and validation cohort. High-risk patients were associated with male, higher numbers of bone marrow blast cells, higher IPSS score and lower absolute neutrophil count (Table 1) Figure 6A ,B. GSEA identified 36 significantly enriched KEGG pathways in the training or validation cohort. The majority of the metabolism-associated pathways were enriched in the low-risk group, and the metabolic pathways ranked by NES were cysteine and methionine metabolism, glycine serine and threonine metabolism, fatty acid metabolism and pyrimidine metabolism. On the contrary, the majority of the non-metabolism-associated pathways were enriched in the high-risk group. Additionally, most enriched pathways were correlated with cancer (such as the cell cycle and phosphatidylinositol signalling system) or metabolism (such as the glycine serine and threonine metabolism, cysteine and methionine metabolism) ( Figure 7A ,B). The mutation variants of the metabolic gene panel were explored in CCLE database by the cBioportal for Cancer Genomics. 18, 19 As was expected, the gene amplification, which can change Fifteen metabolic genes were identified to construct the metabolic model, most of which have been reported to be involved in malignancy. PFKFB2, a vital regulator of glucose metabolism, has been defined as a candidate gene for GC-triggered apoptosis according to comparative expression profiling in childhood acute lymphoblastic leukaemia (ALL). 23 Interestingly, PFKFB2 was suppressed by miR-613 in gastric cancer, which could further inhibit cell proliferation and invasion. 24 The expression pattern is reported for the first time as a potential marker in MDS PLCB2, involved in inositol phosphate metabolism, has been narrowly linked to the poor prognosis in patients with hepatocellular carcinoma, lung cancer and mammary carcinoma. 25 In our study, PLCB2 was negatively correlated with study should also be acknowledged. Firstly, we are unavailable to more clinical information due to the data driving from GEO database. Secondly, the significance of the metabolic panel should be further confirmed in real-world research, and further basic experiments are simultaneously necessary to explore the underlying pathogenesis. In summary, we constructed a novel prognostic prediction model based on metabolic genes from GEO database for MDS, and further validated in the validation cohort. The prognostic model was not only an independent prognostic predictor for MDS but also reflected the disordered metabolism of MDS. Moreover, this panel could be utilized as an effective approach for prognostic prediction in MDS patients in clinical practice. The authors sincerely thank GEO for sharing the huge amount of data. The authors declare no conflict of interest. Conceptualization, FH and YL; methodology, CSL, and FH; validation, CSL; formal analysis,; investigation, FH, CSL, and DYJ; resources, YL; data curation, CYY; writing-original draft preparation, YW, and FH; writing-review and editing, YW, FH and JYL; visualization, SLC, and LLS and DJD; supervision, YL; project administration, YL; funding acquisition, YL. The GSE58831 and GSE11 4922 datasets were collected via the Gene Expression Omnibus (GEO) database, which were utilized for retrieving clinicopathological data and RNA expression patterns. All data or code generated or used during this study are available from the corresponding author by reasonable request. Si-liang Chen https://orcid.org/0000-0002-4095-5974 The genetic basis of phenotypic heterogeneity in myelodysplastic syndromes Myelodysplastic syndromes Clinical implications of genetic mutations in myelodysplastic syndrome Immunologic aspects of hypoplastic myelodysplastic syndrome Revised international prognostic scoring system for myelodysplastic syndromes Myelodysplastic syndromes, version 2.2017, NCCN clinical practice guidelines in oncology Use of newer prognostic indices for patients with myelodysplastic syndromes in the low and intermediate-1 risk categories: a population-based study Transplantation of a myelodysplastic syndrome by a long-term repopulating hematopoietic cell Characterization and targeting of malignant stem cells in patients with advanced myelodysplastic syndromes Inhibition of amino acid metabolism selectively targets human leukemia stem cells Venetoclax with azacitidine disrupts energy metabolism and targets leukemia stem cells in patients with acute myeloid leukemia BCL-2 inhibition targets oxidative phosphorylation and selectively eradicates quiescent human leukemia stem cells Evaluation of bone marrow microenvironment could change how myelodysplastic syndromes are diagnosed and treated Protective mitochondrial transfer from bone marrow stromal cells to acute myeloid leukemic cells during chemotherapy Combining gene mutation with gene expression data improves outcome prediction in myelodysplastic syndromes Impact of spliceosome mutations on RNA splicing in myelodysplasia: dysregulated genes/ pathways and clinical associations Microarray analysis after RNA amplification can detect pronounced differences in gene expression using limma The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal Hallmarks of cancer: the next generation Enhanced renewal of erythroid progenitors in myelodysplastic anemia by peripheral serotonin Higher fetuin-A, lower adiponectin and free leptin levels mediate effects of excess body weight on insulin resistance and risk for myelodysplastic syndrome Identification of glucocorticoid-response genes in children with acute lymphoblastic leukemia Anti-spike IgG causes severe acute lung injury by skewing macrophage responses during acute SARS-CoV infection Distinct prognostic values of phospholipase C beta family members for non-small cell lung carcinoma Loss of Dnmt3b function upregulates the tumor modifier Ment and accelerates mouse lymphomagenesis Loss of Dnmt3b accelerates MLL-AF9 leukemia progression Prognostic and biologic significance of DNMT3B expression in older patients with cytogenetically normal primary acute myeloid leukemia Pathways of retinoid synthesis in mouse macrophages and bone marrow cells Arginine addiction in AML Arginine deprivation for ASS1-deficient mesothelioma A phase II study of arginine deiminase (ADI-PEG20) in relapsed/refractory or poor-risk acute myeloid leukemia patients Arginine deprivation using pegylated arginine deiminase has activity against primary acute myeloid leukemia cells in vivo Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors Impact of DNA amplification on gene expression patterns in breast cancer Identification of a metabolic gene panel to predict the prognosis of myelodysplastic syndrome