key: cord-0926197-0f1ah8zl authors: Sang, Liang; Yu, Zhanwu; Wang, Ang; Li, Hao; Dai, Xiantong; Sun, Liping; Liu, Hongxu; Yuan, Yuan title: Identification of methylated-differentially expressed genes and pathways in esophageal squamous cell carcinoma date: 2020-06-10 journal: Pathol Res Pract DOI: 10.1016/j.prp.2020.153050 sha: ff273ca42b178490c60bb82dd58e0c753a7f010b doc_id: 926197 cord_uid: 0f1ah8zl Methylation, as an epigenetic modification, can affect gene expression and play a role in the occurrence and development of cancer. This research is devoted to discover methylated-differentially expressed genes (MDEGs) in esophageal squamous cell carcinoma (ESCC) and explore special associated pathways. We downloaded GSE51287 methylation profiles and GSE26886 expression profiles from GEO DataSets, and performed a comprehensive bioinformatics analysis. Totally, 19 hypermethylated, lowly expressed genes (Hyper-LGs) were identified, and involved in regulation of cell proliferation, phosphorus metabolic process and protein kinase activity. Meanwhile, 17 hypomethylated, highly expressed genes (Hypo-HGs) were participated in collagen catabolic process, metallopeptidase and cytokine activity. Pathway analysis determined that Hyper-LGs were enriched in arachidonic acid metabolism pathway, while Hypo-HGs were primarily associated with the cytokine-cytokine receptor interaction pathway. IL 6, MMP3, MMP9, SPP1 were identified as hub genes based on the PPI network that combined 7 ranked methods included in cytoHubba, and verification was performed in human tissues. Our integrated analysis identified many novel genetic lesions in ESCC and provides a crucial molecular foundation to improve our understanding of ESCC. Hub genes, including IL 6, MMP3, MMP9 and SPP1, could be considered for use as aberrant methylation-based biomarkers to facilitate the accurate diagnosis and therapy of ESCC. Esophageal cancer, is one of the most invasive cancers and, broadly, the seventh principal cause of cancer-related deaths for males. In China, it is the fourth-most fatal cancer, and includes two main subtypes, namely, esophageal squamous cell carcinoma (ESCC), and esophageal adenocarcinoma (EAC) [1, 2] . ESCC is the primary histological classification that is broadly observed [3] , and both its occurrence and progression are related to genetic factors, such as genomic amplifications, insertions, deletions, and mutations [4] , as well as tumor epigenetics, which includes DNA methylation, histone acetylation and noncoding RNA [5] . DNA methylation commonly affects independent loci, and different tumor types exhibit unique signatures of DNA methylation deregulation [6] . Aberrant DNA methylation, whether the hypermethylation of tumor suppressor genes or the hypomethylation of oncogenes, is considered to be a significant factor for carcinogenesis [7] . As a result, a profound understanding of methylated-differentially expressed genes (MDEGs) and their genetic characteristics is essential in the elucidation of the physiological and pathological processes of ESCC. Earlier studies often analyzed gene expression or DNA methylation using array-based profiling [8] [9] [10] . However, the primary focus of many of these studies was on the relationship between the expression and methylation of solitary genes [11, 12] . We can concurrently detect methylated expressed genes through the joint analysis of methylation and gene expression microarray files and further determine their interrelated functions and biological characteristics [13, 14] . To our knowledge, there are no published studies that have jointly examined gene expression and methylation profiling microarrays to investigate the carcinogenic processes of ESCC. We aimed to profile the associations of interactions between differentially methylated genes (DMGs) and differentially expressed genes (DEGs), as well as signaling pathways in ESCC, through the bioinformatics analysis of gene methylation microarray profiles (GSE51287) and gene expression microarray profiles (GSE26866). The purpose of this study was to gain a novel perspective into the genetic features and bi-J o u r n a l P r e -p r o o f ological pathways of MDEGs in ESCC, as well as to offer insights into the pathogenesis of ESCC. We performed our analysis using methylation and mRNA microarray datasets to identify MDEGs between normal and ESCC tissue samples. The gene methylation dataset (GSE51287) and gene expression dataset (GSE26886) were downloaded from the Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/), which is a database maintained by the National Center for Biotechnology Information (NCBI). Differential expression between normal and ESCC tissue samples was identified using the online tool GEO2R (http://www.ncbi.nlm.nih.gov/geo/geo2r/) across experimental conditions in the GEO series. The multiple t test was used to detect statistically significant genes, and the Benjamin & Hochberg false discovery rate method is selected. Fold change was used to express a difference in expression. In this study, we defined the cut-off criteria based on P < 0.05 and |fold change| > 2 to identify DMGs and DEGs. Additionally, we classified overlapping MDEGs by performing a "MATCH function". Finally, Hyper-LGs were identified by overlapping down-regulated and hypermethylated genes, while Hypo-HGs were identified by overlapping up-regulated and hypomethylated genes. GSE51287 methylation probes performed an exclusion criteria about filtering that located in sex chromosome. J o u r n a l P r e -p r o o f We performed functional annotation and enrichment analysis using the Database for Annotation, Visualization and Integrated Discovery (DAVID, https://david.ncifcrf.gov/), which is an online tool used to determine the biological characteristics of large lists of genes [15] . Gene Ontology (GO) analysis is a robust bioinformatics tool for gene and gene product annotation, including biological processes, cellular components, and molecular functions [16] . The Kyoto Encyclopedia of Genes and Genomes (KEGG) database can be used to visualize molecular datasets including genomics, transcriptomics, proteomics, and metabolomics via the KEGG pathway to interpret the biological functions of these molecules [17] . The GO function and KEGG pathway analyses of MDEGs were performed using DAVID, and a P-value < 0.05 was considered to be statistically significant. STRING (Search Tool for the Retrieval of Interacting Genes/Proteins, http://string-db.org/) is an online database that predicts PPIs [18] . To determine the underlying molecular principles of cell activity within the context of cancer progression, we constructed a PPI network using STRING for the Hyper-LGs and Hypo-HGs. An interaction score of 0.4 was set as the cut-off standard. We then visualized the PPI network using Cytoscape (http://www.cytoscape.org/) and ranked hub genes using cytoHubba within Cytoscape. The Cancer Genome Atlas (TCGA) database, a collaboration between the National Cancer Institute (NCI) and National Human Genome Research Institute (NHGRI), has generated comprehensive, multi-dimensional maps of the key genomic changes in various types of cancers. In order to confirm the results, Hyper-LGs and Hypo-HGs were then analyzed in TCGA database. In this study, the nominal P values were adjusted by Benjamin & Hochberg false discovery rate method, and we defined the cut-off criteria based on P < 0.05 and |fold change| ≥ 1 to identify DMGs and DEGs. table 1 and table 2 . For BGS, the PCR products were used for forward sequence analysis after product purification. For MSP, the PCR products were separated and confirmed, along with methylated or unmethylated statues were defined in accordance with our previous study [13] . In addition, quantitative real-time PCR was performed to profile mRNA expression. Differences between groups were compared by the Mann-Whitney U test, and a P-value < 0.05 was considered statistically significant. All the primers and conditions are shown in table 3 and table 4 . A total of 1,468 and 54,675 gene records were obtained from the GSE51287 and GSE26886 profiling datasets, respectively. We performed an online analysis using For Hypo-HGs, BPs were enriched in the collagen catabolic process, multicellular organismal catabolic and metabolic process. CC were enriched in the extracellular space, the matrix region part, and in the proteinaceous extracellular matrix. Finally, MF enrichment was found in metalloendopeptidase activity, metallopeptidase activity, endopeptidase activity, calcium ion binding, and cytokine activity. For Hyper-LGs, KEGG pathway enrichment analysis demonstrated enrichment in the arachidonic acid metabolism pathway. Hypo-HGs were significantly involved in the toll-like receptor signalling pathway and the cytokine-cytokine receptor interaction pathway (table 6) . We analyzed the MDEGs using the online PPI network tool STRING. Nineteen nodes and 3 edges in the Hyper-LGs networks, and 17 nodes and 54 edges in Hypo-HGs networks were found, respectively. The PPI network revealed significant interactions for the Hypo-HGs, with an enrichment P-value of 4.19e-09, whereas the PPI network did not detect any significant interactions for the Hyper-LGs (P = 0.225). The Hypo-HGs PPI network is shown in Figure 2 . We then visualized the Hypo-HGs network using Cytoscape, and the hub genes were identified by cytoHubba within J o u r n a l P r e -p r o o f Cytoscape. Finally, we identified 5 hub genes by overlapping 7 ranked methods in cytoHubba (table 7) . These genes are annotated as Interleukin 6 (IL6), Matrix Metallopeptidase 9 (MMP9), MMP3, MMP7, and Secreted Phosphoprotein 1 (SPP1). There are 95 ESCC tissues, but only 3 normal control tissues in TCGA database including both DNA methylation and mRNA expression. We downloaded the data for We next sought to verify the five identified hub genes in human tissues and found that gene expression levels of IL6, MMP9, MMP3, and SPP1 were higher in tumor tissues than in non-tumor tissues, though only SPP1 with a statistically significant. ESCC goes through a multistage and complex process that involves multiple molecular changes comprised of increasing genetic, epigenetic, and endocrine aberrations [19] . We identified 19 Hyper-LGs and 17 Hypo-HGs through the analysis of gene methylation microarray data (GSE51287) and gene expression profiling data J o u r n a l P r e -p r o o f (GSE26886) for ESCC by utilizing public datasets and online bioinformatics tools. We found that linked genes could possibly be associated with the molecular guidance of vital pathways that are related to the pathogenesis of ESCC. Enrichment and functional analysis of the genes identified major pathways, and hub genes that are related to methylation offer a unique perspective into the pathogenesis of ESCC. Based on DAVID analysis, GO enrichment of Hyper-LGs in ESCC revealed that BP included the protein amino acid phosphorylation, regulation of cell proliferation and phosphorus metabolic process. MF was enriched in protein kinase activity, transmembrane receptor protein tyrosine kinase activity, vascular endothelial growth factor receptor activity and ATP binding. KEGG enrichment analysis also indicated that the progression of ESCC may be affected by methylation via the arachidonic acid metabolism pathway. Hypo-HGs in ESCC were enriched in BPs, including the collagen catabolic process, multicellular organismal catabolic and metabolic processes. GO analysis found that MF was enriched in metalloendopeptidase activity, metallopeptidase activity, endopeptidase activity, calcium ion binding, and cytokine activity. A previous study found that the cell cycle regulation pathway is ubiquitous in ESCC [20] , and there are notable associations between cell proliferation, gene expression and signal transduction in ESCC incursion and metastasis [21] . A transcription factor that induces the cellular response to oxidative stress plays crucial role in the development of ESCC [22] . According to our GO analysis, MFs enrichment included metalloendopeptidase activity, metallopeptidase activity, endopeptidase activity, calcium ion binding, and cytokine activity. After constructing PPI networks for MDEGs, we observed significant interactions only for the Hypo-HGs network, with some of the MDEGs being implicated in the pathogenesis of ESCC. We visualized the Hypo-HGs network using Cytoscape and utilized cytoHubba within Cytoscape to identify the following 5 hub genes: IL6, MMP9, MMP3, MMP7, and SPP1. We then verified the observed interactions of these hub genes in human tissues and found that IL6, MMP9, MMP3, and SPP1 exhibited a negative correlation between high gene expression and DNA hypomethyla-J o u r n a l P r e -p r o o f tion, but the same was not observed for MMP7. IL6 is a proinflammatory cytokine associated with cancer development [23, 24] , including ESCC [25] , and ESCC patients have been reported to have significantly higher serum expression levels of IL6 [26] . Furthermore, the methylation of IL6 has been associated with a range of cancers [27] . The autocrine loops of IL6 and IL6R are engaged in the development and progression of ESCC [28] , which is inconsistent with our finding regarding the cytokine-cytokine receptor interaction pathway based on KEGG enrichment analysis. Matrix metalloproteinases (MMPs), a zinc-dependent endopeptidases enzyme family with the ability to degrade extracellular matrix components, are considered to be involved in various stages of cancer progression [29] [30] [31] . Changes in MMPs expression may be due to DNA methylation since hypomethylation of a gene can result in transcriptional up-regulation [11, 32] . MMP3 is one of the MMP genes involved in tumor initiation, has calcium ion binding and metallopeptidase activity, and is known to degrade basal membrane collagen and stimulate the production of other MMPs [33] . MMP9, categorized in the subgroup of gelatinases, was shown to participate in the growth and progression of many cancers [34] . MMP9 activity primarily degrades type IV collagen, which is regarded as the main basement membrane component [35] , indicating an association with the progression of ESCC through the collagen catabolic process, which was found to be enriched in this study. Over-expression of MMP3 or MMP9 is closely related to the pathogenesis of many cancers, including ESCC [36] [37] [38] , consistent with our findings. SPP1 is a protein coding gene with cytokine and extracellular matrix binding activity, that is involved in many biological processes, including cell proliferation, migration and invasion, and its over-expression is linked to tumor initiation and prognosis, including in ESCC [39] [40] [41] [42] . A previous study showed that hypomethylation of SPP1 was strongly associated with cancer progression, though an inverse relationship with its mRNA expression was found for gastrointestinal stromal tumors [43] . This finding is also consistent with the results of our ESCC analysis. Taken together, the identified hubs are closely related to metallopeptidase activity, cytokine activity, and the degradation of extracellular matrix components during physiological and pathological processes. In our previous studies, we found that cancer-related genes were hypermethylatedlowly expressed or hypomethylated -highly expressed in the tumor tissues of cancer patients [13, 44, 45] . In this study, our findings indicate that the MDEGs in ESCC may exert regulatory effects composed of molecular functions and biological processes, as reliably characterized by enrichment analysis. The Hypo-HGs PPI network had significantly more interactions than expected, though the same was not seen in the Hyper-LGs network. DNA hypomethylation of the identified hub genes generally leads to increased gene expression and may promote the progression of ESCC. However, further investigation of the novel genes and pathways identified in this study that have not been previously considered as targets during ESCC pathogenesis are required. Several limitations of our study should be mentioned. First, our study did not investigate the clinical parameters and prognosis due to a lack of data availability in the bioinformatics databases and tools. Second, the sample size was small since only two microarray profiles were analyzed, and only 20 pairs of human tumor and adjacent non-tumor tissue were verified, this may have led to no significant difference in the methylation status of hub genes in tumor tissue compared with non-tumor tissue. Third, although we analyzed the GEO database and TCGA database at the same time, due to the bias of the samples, especially the small number of normal tissue samples in TCGA database, we did not obtain the key genes of ESCC common MDEGs in the two databases. Thus, larger sample sizes are necessary to validate our findings. In addition, future molecular experiments on the identified target genes and pathways in ESCC should be performed. In conclusion, we elucidated the biological characteristics of ESCC by constructing The authors declare no conflicts of interest. This study was supported by the National Key R&D Program of China (Grant J o u r n a l P r e -p r o o f Cancer statistics in China Dietician-delivered intensive nutritional support is associated with a decrease in severe postoperative complications after surgery in patients with esophageal cancer Genomic characterization of esophageal squamous cell carcinoma: Insights from next-generation sequencing Mutational landscape and significance across 12 major cancer types Epigenetic mechanisms in tumorigenesis, tumor cell heterogeneity and drug resistance Comparative pan-cancer DNA methylation analysis reveals cancer common and specific patterns Genomic profiling of esophageal squamous cell carcinoma (ESCC)-Basis for precision medicine Targeted bisulfite sequencing identified a panel of DNA methylation-based biomarkers for esophageal squamous cell carcinoma (ESCC) Comprehensive bioinformation analysis of methylated and differentially expressed genes in esophageal squamous cell carcinoma CpG island hypermethylation and tumor suppressor genes: a booming present, a brighter future Differential expression of TCEAL1 in esophageal cancers by custom cDNA microarray analysis Bioinformatics-Based Identification of Methylated-Differentially Expressed Genes and Related Pathways in Gastric Cancer, Digestive diseases and sciences Racial Differences in Esophageal Squamous Cell Carcinoma: Incidence and Molecular Features Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources The Gene Ontology (GO) project in 2006 KEGG: Kyoto Encyclopedia of Genes and Genomes Hedgehog signalling pathway orchestrates angiogenesis in triple-negative breast cancers Androgen receptor promotes esophageal cancer cell migration and proliferation via matrix metalloproteinase 2 Genomic analyses reveal mutational signatures and frequently altered genes in esophageal squamous cell carcinoma Microarray analyses reveal genes related to progression and prognosis of esophageal squamous cell carcinoma Genomic Landscape of Esophageal Squamous Cell Carcinoma in a Japanese Population Cytokine patterns in cancer patients: A review of the correlation between interleukin 6 and prognosis Higher importance of interleukin 6 than classic tumor markers (carcinoembryonic antigen and squamous cell cancer antigen) in the diagnosis of esophageal cancer patients IL-6 expression predicts treatment outcome in squamous cell carcinoma of the esophagus IL6 derived from cancer-associated fibroblasts promotes chemoresistance via CXCR7 in esophageal squamous cell carcinoma Longitudinal Study of DNA Methylation of Inflammatory Genes and Cancer Risk, Cancer epidemiology, biomarkers & prevention : a publication of the Relationship between serum levels of interleukin 6, various disease parameters and malnutrition in patients with esophageal squamous cell carcinoma Matrix metalloproteinase 1, 3, and 9 polymorphisms and esophageal squamous cell carcinoma risk Esophageal cancer stem cells express PLGF to increase cancer invasion through MMP9 activation Matrix metalloproteinases: regulators of the tumor microenvironment DNA Methylation of MMP9 Is Associated with High Levels of MMP-9 Messenger RNA in Periapical Inflammatory Lesions The functional SNP in the matrix metalloproteinase-3 promoter modifies susceptibility and lymphatic metastasis in esophageal squamous cell carcinoma but not in gastric cardiac adenocarcinoma The role of matrix metalloproteinases (MMPs) and their inhibitors (TIMPs) in the development of esophageal cancer Matrix metalloproteinases expression correlates with survival in patients with esophageal squamous cell carcinoma Clinicopathological and prognostic role of MMP-9 in esophageal squamous cell carcinoma: a meta-analysis Expression and prognostic relevance of cyclophilin A and matrix metalloproteinase 9 in esophageal squamous cell carcinoma Matrix metalloproteinase variants associated with risk and clinical outcome of esophageal cancer Waisberg, the expression of the extracellular matrix genes SPARC, SPP1, FN1, ITGA5 and ITGAV and clinicopathological parameters of tumor progression and colorectal cancer dissemination Osteopontin overexpression in breast cancer: knowledge gained and possible implications for clinical management Prognostic and predictive values of SPP1, PAI and caveolin-1 in patients with oral squamous cell carcinoma Immune signature profiling identified predictive and prognostic factors for esophageal squamous cell carcinoma Combined DNA methylation and gene expression profiling in gastrointestinal stromal tumors reveals hypomethylation of SPP1 as an independent prognostic factor Aberrantly methylated-differentially expressed genes and pathways in colorectal cancer Bioinformatics analysis of aberrantly methylated-differentially expressed genes and pathways in hepatocellular carcinoma J o u r n a l P r e -p r o o f J o u r n a l P r e -p r o o f Hyper-LGs: hypermethylated, lowly expressed genes; Hypo-HGs: hypomethylated, highly expressed genes.