key: cord-0013524-x02pf4eu authors: Dai, Zhou-Tong; Wang, Jun; Zhao, Kai; Xiang, Yuan; Li, Jia Peng; Zhang, Hui-Min; Peng, Zi-Tan; Liao, Xing Hua title: Integrated TCGA and GEO analysis showed that SMAD7 is an independent prognostic factor for lung adenocarcinoma date: 2020-10-30 journal: Medicine (Baltimore) DOI: 10.1097/md.0000000000022861 sha: 914fc3368d31ee40a4cae00d6beeb146ea09c6f1 doc_id: 13524 cord_uid: x02pf4eu The lack of effective markers leads to missed optimal treatment times, resulting in poorer prognosis in most cancers. Drosophila mothers against decapentaplegic protein (SMAD) family members are important cytokines in the transforming growth factor-beta family. They jointly regulate the processes of cell growth, differentiation, and apoptosis. However, the expression of SMAD family genes in pan-cancers and their impact on prognosis have not been elucidated. Perl software and R software were used to perform expression analysis and survival curve analysis on the data collected by TCGA, GTEx, and GEO, and the potential regulatory pathways were determined through gene ontology enrichment and kyoto encyclopedia of genes and genomes enrichment analysis. It was found that SMAD7 and SMAD9 expression decreased in lung adenocarcinoma (LUAD), and their expression was positively correlated with survival time. Additionally, SMAD7 could be used as an independent prognostic factor for LUAD. In general, SMAD7 and SMAD9 can be used as prognostic markers of LUAD. Further, SMAD7 is expected to become a therapeutic target for LUAD. Transforming growth factor-beta (TGF-b) is distributed in various systems of the human body and is an extremely important type of growth factor that regulates the process of cell differentiation and maturation. It regulates a series of states such as cell migration, growth, differentiation, and apoptosis. [1] Early studies have shown that drosophila mothers against decapentaplegic protein (SMAD) can be directly activated by TGF-b induced cell membrane receptors to form transcription complexes, which further control the transcription of target genes in the nucleus of transcription. Therefore, SMAD protein not only becomes an essential part of the TGF-b signaling pathways but also regulates cell function together. [2] In recent years, it is related to the occurrence and development of various diseases, especially multiple malignant tumors. [3] Currently, 8 types of SMAD proteins are encoded in the human genome. SMAD1, SMAD5, SMAD9 (SMAD8) belong to the receptor substrates of anti-mullerian-hormone and bone morphogenetic protein in the TGF-b family, [4] SMAD2, and SMAD3 are receptor substrates of activin, TGF-b, and Nodal pathways. SMAD4 assists all R-SMADs. SMAD6 and SMAD7 are inhibitory SMAD proteins. [5] Recent studies have shown that SMAD1 promotes colorectal cancer cell migration. [6] Meanwhile, in lung cancer, blocking SMAD2 and SMAD4 can block the function of TGF-b. [7] SMAD3 can participate in the epithelial-mesenchymal transition (EMT) process of cervical cancer through Long noncoding RNA OIP5-AS1. The expression level of SMAD4 is positively correlated with the survival rate of colon cancer, and the lack of SMAD4 leads to a poor prognosis. [8] Down-regulation of SMAD5 can inhibit nasopharyngeal carcinoma cell proliferation, invasion, and migration. [9] The up-regulation of SMAD6 can promote the appreciation of liver cancer cells. [10] In addition, according to different results, how SAMD7 regulates cancer cell proliferation and migration is still controversial. [11] The expression of SMAD9 is also closely related to the risk of essential hypertension. However, the SMAD family's expression and function in some cancers are still unclear, and there are fewer reports about prognosis, especially in lung cancer. At present, the use of large databases and regulatory networks has been widely accepted in biology, such as the prognostic characteristics of melanoma by transcriptome analysis, [12] and the development of new prognostic models in liver cancer. [13] In this study, a metaanalysis of the expression and prognostic value of SMAD family members in cancer using multiple databases. We explore the expression, prognosis, clinical features, and possible regulatory pathways of SMAD family members in patients with lung adenocarcinoma (LUAD), and provided a theoretical basis for future studies of SMAD family members in LUAD. In order to analyze the expression of SMAD family genes in cancer, survival data and expressions of clinical from GTEx, TCGA, and Oncomine databases were summarized to verify the expression of each SMAD family gene in cancer. Oncomine database is a gene chip-based database and integrated data extraction platform. In this database, you can set the conditions for filtering and extracting data according to your own needs (http://www.oncomine.org). In this study, we set the screening conditions as: "Analysis type: cancer vs normal analysis. P-value: .05. Threshold (fold change): 2. Threshold (gene rank): Top 10%." Meanwhile, The gene expression profling interactive analysis (GEPIA) tool was used to analyze the clinical data of the GTEx and TCGA databases (http://gepia.cancer-pku.cn) and to compare the expression differences of the SMAD family in pancancers with their control of normal tissues and its survival. [14] In addition, the Oncomine database was used to analyze the different expression of SMAD families in each lung cancer subtype. The expression data of all genes in LUAD and their corresponding clinical information were extracted from the TCGA database. R software with limma was used to average the repeated data of each expression. In addition, Perl software was used to summarize the information into a matrix. The Wilcox Test was used on a single gene to examine the relationship with clinical characteristics. Log (x+1) processing was performed on data of the extracted TCGA clinical, and the samples were divided into high expression groups and low expression groups according to the median expression of the SMAD7 and SMAD9. R software with limma, pheatmap, and ggplot2 packages was used to filter the original data of the target gene in LUAD and normalize and screen out the differential genes between the 2 groups. The result were displayed using volcanic plots and heatmaps. Differential gene screening criteria: j log FC j ≥2, P. adjust < .05. The clustering method was Euclidean distance. The cluster profiler package of R software was used to perform gene ontology (GO) functional analysis and kyoto encyclopedia of genes and genomes (KEGG) pathway analysis on the differential genes screened above, and the differences were screened using P. adjust < .05 as the threshold. [15] [16] [17] P. adjust <.05 is the main enrichment function and pathway of screening differential genes for the conditional threshold. The selected differential genes were introduced into the search tool for the retrieval of interacting genes. It is an online analysis website for protein-protein interaction (PPI) (https://string-db. org/). [18] The results were imported into Cytoscape software, [19] and key protein expression modules and key node genes were screened. The download data of GSE43767 from the database of GEO chips in NCBI (http://www.ncbi.nlm.nih.gov/geo/), [20, 21] which includes 113 samples and 29 samples of therapeutic or spontaneous abortion, 15 normal samples and 69 LUAD samples. R software with the limma and beeswarm packages were used to process the obtained data and draw different expression heatmaps and volcano plots. Analysis of TCGA clinical data was used by gene set enrichment analysis (GSEA) with version of 4.0.3. [22, 23] According to the expression levels of SMAD7 and SMAD9, they were divided into 2 groups: high expression group and low expression group. The effect of their expression level on the gene set of various biological pathways was analyzed by GSEA. The gene set obtained from the MsigDB database of the GSEA website was used as the reference gene set, and the P-value was calculated 1000 times per analysis cycle according to the weighted method. R software with survival and survminer packages were used to analyze TCGA clinical data. Both univariate analysis and multivariate analysis were COX proportional hazard regression models. All the data of this paper was obtained from the open-access database, we did not get these data from patients or animals directly, nor intervene these patients. So the ethical approval was not necessary. The expression of SMAD protein family members in human cancers at the mRNA level was analyzed by using the Oncomine online database. Analysis of expression differences between cancer and normal tissues according to the selected criteria, the results showed that there were 442, 458, 453, 459, 459, 448, 456, 389 independent studies in the database involving expressions from SMAD1 to SMAD9 (Fig. 1 ). Interestingly, with the exception of a few SMAD genes that have increased expression in several specific cancers, SMAD protein family members have decreased expression in most cancers. In detail, SMAD1 expression increased in brain and central nervous system (CNS) cancer and lymphoma, the expression of SMAD5 increased in brain and CNS cancer, colorectal cancer and kidney cancer, SMAD6 expression increased in esophageal cancer, and SMAD9 expression increased in brain and CNS cancer. However, in the other cancer data, as shown in Table 1 , except for testicular cancer, most members of the SMAD protein family have decreased expression, and there is no significant expression difference in other types of cancer. In order to further determine the expression difference of SMAD protein family between cancer and normal tissues, the TCGA and GTEx database was used to jointly analyze the Table 1 Expression of drosophila mothers against decapentaplegic protein famliy in other cancers in Oncomine. expression difference of SMAD protein family in 31 cancers, and a heatmap was drawn (Fig. 2) . The expression of SMAD family genes were used t test to calculate the P value of tumor tissue and normal tissue (Figs. S1-S8, http://links.lww.com/ MD/F118). And found the intersection of the results of TCGA and GTEx and the results of Oncomine database. Compared with normal tissues, it was found that the expressions of SMAD1, SMAD4, SMAD5, and SMAD7 were significantly different in brain lower grade glioma. In invasive breast carcinoma, SMAD9 expression was significantly different. There are significant differences in the expression of SMAD1 in acute myeloid leukemia, significant differences in the expression of SAMD6, SMAD7, and SMAD9 in LUAD, and the expression of SMAD1 and SMAD7 in lymphoid neoplasm diffuse large B-cell lymphoma. There are significant differences in the expression of SMAD6 in prostate adenocarcinoma, and significant differences in the expression of SMAD1 and SMAD7 in testicular germ cell tumors. To determine the prognostic values of the genes selected, Kaplan-Meier survival analysis was conducted on the genes selected above based on the clinical information in the TCGA database. In LUAD, SMAD6 (log-rank P = .65, P (HR) = .66) cannot show an obvious correlation with overall survival (Fig. 3) . Similarly, in all other cancers that have been analyzed, the differential genes for other SMAD protein families also showed the same negative results as SMAD6. However, in LUAD, both SMAD7 (log-rank P = .0099, p (HR) = 0.01) and SMAD9 (log-rank P = .0017, P (HR) = .0019) shown in figure showed positive results (Fig. 3) . The prognosis of SMAD7 and SMAD9 high expression groups were significantly better than that of low expression groups. In order to evaluate the clinical characteristics of SMAD7 and SMAD9, we extracted the expression data of SMAD7 and SMAD9 in TCGA in different types of lung cancer. they showed the same results as that from the combined analysis of TCGA and GTEx (Fig. 4A ). In LUAD, the expressions of SMAD7 (P = 1.76 Â 10 À12 ) and SMAD9 P = 1.64 Â 10 À12 ) were reduced compared to normal tissues. However, as shown in figure (Fig. 4B .C.D), after analyzing their stage, gender, age, and expression, it was found that the expression of SMAD7 has nothing to do with the stage, gender, and age. In SMAD9 (Fig. 4E . F.G), although there are differences between Stage1 and Stage3 (P = 1.07 Â 10 À2 ), there is no continuous difference. Thus, the expression of SMAD9 is independent of the stages. On the other hand, SMAD9 expression was slightly higher in women than that in men (P = 2.07 Â 10 À2 ). In addition, the expression of SMAD9 is higher in young patients, but it is worth noting due to the insufficient sample size of young patients (n = 12). The data of LUAD in TCGA were divided into 2 groups of high expression and low expression according to the target gene median, and the DEG was used to screen the gene expression data between the 2 groups with limma in R software. According to the grouping result, a total of 12 DEGs of SMAD7 and 57 DEGs of SMAD9 were identified from the TCGA database (Fig. 5A.B) . Cluster Profiler, org.Hs.eg.db, richplot, and ggplot2 packages in R software were employed to analyze the functions of DEGs in LUAD. The results show that in GO enrichment, SMAD7 mainly participates in the function of the regulation of blood vessels. SMAD9 is mainly involved in the function of zymogen activation (Fig. 6A ). In the KEGG enrichment, SMAD7 is mainly involved in functions such as protein digestion and absorption, and SMAD9 is mainly involved in functions such as the NOD-like receptor signaling pathway (Fig. 6B ). Although DEGs have been used for GO and KEGG enrichment, it only screened for differential expressions and did not involve the degree and direction of differential gene expressions. Therefore, the function of SMAD7 and SMAD9 in LUAD was further analyzed by using GESA. In the GO and KEGG enrichment analysis results, SMAD7 positive mainly regulates processes such as cellular respiration and RNA degradation, but SMAD7 negative mainly regulates processes such as regulation of cellular response and leukocyte transendothelial migration (Fig. 7A.B) . SMAD9 positive mainly regulates the processes such as chronic inflammatory response and galactose metabolism, while SMAD9 negative mainly regulates the processes such as lung alveolus development and GNRH signaling pathway (Fig. 8A.B) . These enrichment analysis results can better help us understand how SMAD7 and SMAD9 participate in the regulation of LUAD. The PPI network helps us to explore further the molecular mechanism of SMAD7 and SMAD9 in LUAD. The search tool for the retrieval of interacting genes network tool was used to analyze the identified DEGs. After hiding the disconnected nodes in the network for SMAD7 (Fig. 9A) , the PPI network of the DEGs consisted of 12 nodes and 15 edges. The top 5 of predicted functional partners are FGG, COL1A2, F2, SERPINC1, and LOX. Mainly through platelet aggregation, common pathway of fibrin clot formation (FDR = 1.05 Â 10 À10 ), integrin cell surface interactions (FDR = 1.14 Â 10 À9 ), extracellular matrix organiza-tion (FDR = 2.58 Â 10 À9 ) and GRB2: SOS provides linkage to MAPK signaling for Integrins (FDR = 4.79 Â 10 À6 ), which regulates the occurrence and development of LUAD. For SMAD9 (Fig. 9B) , the PPI network of the DEGs consisted of 54 nodes and 238 edges. The top 5 of predicted functional partners are FGB, HGF, F2, TRAF2, and OASL, which mainly involved in the immune system (FDR = 1.05 Â 10 À7 ), interferon alpha/beta signaling (FDR = 2.22 Â 10 À7 ), cytokine signaling in immune system (FDR = 5.45 Â 10 À7 ) and interferon signaling (FDR = 6.62 Â 10 À6 ), the finding showed that SMAD7 and SMAD9 are mainly regulated the occurrence and development of LUAD. GSE43767 is a microarray study based on normal and LUAD patients. It includes data from 15 normal lung tissue samples and (Fig. 10 ) that the expression of SMAD7 and SMAD9 was significantly reduced in cancer patients. The clinical information of LUAD patients was extracted from the TCGA database, and some clinical information samples with some missing data were deleted, and the survival and survminer packages in R software were employed to analyze the data using the COX risk ratio model in with the expression of SMAD7 and SMAD9. It was shown ( Table 2 ) that in SMAD7, both the results obtained based on univariate Cox regression analysis (P = 1.42 Â 10 À2 ) or multivariate cox regression analysis (P = 4.88 Â 10 À4 ) are statistically significant (Fig. 11A) , indicating that SMAD7 can be used as an independent prognostic factor for LUAD. Unfortunately, SMAD9 cannot be used as an independent prognostic factor (P = .086) (Fig. 11B ). At present, lung cancer is one of the first high-incidence malignancies among all tumors. [24] Histopathologically, it can be divided into non-small cell lung cancer and small cell lung cancer. Epidemiological statistics show that clinically, non-small cell lung cancer accounts for the vast majority. Non-small cell lung cancer can be divided into squamous cell carcinoma, adenocarcinoma, and large cell lung cancer and other types. [25] LUAD is a primary epithelial tumor of the lung, which mostly originates from the bronchial mucosal epithelium or alveolar epithelium. In recent years, the incidence of LUAD has continued to rise and has now become the most common type of lung cancer worldwide. [26] Unlike lung squamous cell carcinoma, which mostly manifests as central lung cancer, most LUADs occur in the peripheral parts of the lung, and there are no obvious clinical symptoms in the early stage, leading to a poor prognosis. Timely detection and operation can effectively improve patients with the LUAD survival rate. [27] The SMAD signaling pathway is a crucial pathway for the TGF-b transcription factor family to regulate cell proliferation, differentiation, metabolic migration, localization, and apoptosis. [2] As an essential family of transcription factors, studying SMAD family genes can better allow us to understand the occurrence and metastasis of cancer, and thus develop new therapeutic approaches. Although many pathways have been In this study, GTEx, TCGA, and GEO databases were used to analyze the expression of SMAD family in different cancers, and systematically to compare the mRNA expression differences of SMAD family genes in normal tissues and cancer tissues. At the same time, based on the screening results, the expression profile of the SMAD family in LUAD was systematically revealed. These results show that the SMAD family plays an essential role in the development of LUAD. So far, there have been many studies on SMAD1 expression. In hepatocellular carcinoma, the expression of SMAD1/SMAD5/ SMAD8 is reduced compared to normal tissues, but they are not potential biomarkers for hepatocellular carcinoma. [28] In prostate cancer, the expression of SMAD1 is increased. [29] However, there are no reports of SMAD1 on cancer prognosis. Similarly, SMAD2 is also involved in the regulation of many cancers. According to a study in 2018, overexpression of ATF4 can affect the survival rate of triple-negative breast cancer patients through SMAD2. [30] Moreover, there is strong evidence that silencing SMAD2 can inhibit TGF-b function. [31] However, a previous study showed that the absence of SMAD2 could lead to reduced differentiation and increased EMT levels, leading to tumor metastasis. [32] At the same time, in patients with nonsmall cell lung cancer, high expression of p-SMAD2 is predictive of poor clinical survival. [33] Like SMAD2, SMAD3 is considered a tumor suppressor. In lung cancer, the deletion of SMAD3 can inhibit the pathway of tumor growth through TGF-b, thereby promoting tumorigenicity. [34] Moreover, in lung cancer, the expression of SMAD3 increased. Although the number of samples studied in Oncomine was insufficient, it was consistent with our findings. It shows that our research has potential value. [35] Multiple reports have shown that SMAD4 is a tumor suppressor gene. For example, in pancreatic cancer, SMAD4 expression is reduced. The same is true for colorectal cancer. [36, 37] Moreover, special studies have shown that high expression of SMAD4 can prolong the survival time of patients. Although SMAD4 cannot be used as an independent prognostic factor, it can be used as an independent prognostic factor in combination with p-SMAD2. [38] For SMAD5, it has been reported that miR-145 can promote the migration and migration of esophageal cancer cells by inhibiting theexpression of SMAD5. [39] Fstl1 can promote glioma growth through SMAD1/SMAD5/SMAD8. [40] At the same time, SMAD5 Figure 11 . Multi-factor COX analysis forest map of drosophila mothers against decapentaplegic protein 7 and drosophila mothers against decapentaplegic protein 9. Dai et al. Medicine (2020) 99:44 www.md-journal.com expression increases in prostate cancer, which is closely related to postoperative survival. [39] For the SMAD6 gene, high expression of it can promote the development of glioma. [41] Similarly, high expression of SMAD7 can lead to a poor prognosis for acute myeloid leukemia. [42] Unfortunately, the high expression of SMAD9 increases the EMT risk of oral squamous cell carcinoma. [43] These findings reveal that the SMAD family plays an important role in the development of cancer. In this study, SMAD7 and SMAD9 were significantly reduced in lung cancer. Moreover, the expression of SMAD7 and SMAD9 is significantly correlated with the prognosis survival of patients. Through enrichment analysis, it was found that SMAD7 and SMAD9 mainly regulate the occurrence and development of lung cancer through the regulation of cellular response and GNRH signaling pathway pathways. At the same time, SMAD7 can be used as an independent prognostic factor to provide earlier detection and use of new treatments for lung cancer. Taken together, these results suggest that SMAD7 and SMAD9 may be markers and new therapeutic targets for LUAD. Correspondingly, these results need to be further verified by specific experiments. Injury-activated transforming growth factor beta controls mobilization of mesenchymal stem cells for tissue remodeling TGFbeta-SMAD signal transduction: molecular specificity and functional flexibility Bone morphogenetic protein signaling transcription factor (SMAD) function in granulosa cells TGFbeta-1 dependent fast stimulation of ATM and p53 phosphorylation following exposure to ionizing radiation does not involve TGFbeta-receptor I signalling In vitro study of accuracy of cervical pedicle screw insertion using an electronic conductivity device (ATPS part III) SMAD1 promotes colorectal cancer cell migration through Ajuba transactivation MIR-27a regulates the TGF-beta signaling pathway by targeting SMAD2 and SMAD4 in lung cancer High SMAD4 levels appear in microsatellite instability and hypermethylated colon cancers, and indicate a better prognosis Silencing of long non-coding RNA SMAD5-AS1 reverses epithelial mesenchymal transition in nasopharyngeal carcinoma via microRNA-195-dependent inhibition of SMAD5 Hepatic SMARCA4 predicts HCC recurrence and promotes tumour cell proliferation by regulating SMAD6 expression The dual role of Smad7 in the control of cancer growth and metastasis Transcriptomic analysis reveals prognostic molecular signatures of stage I melanoma Development and validation of a CIMPassociated prognostic model for hepatocellular carcinoma GEPIA: a web server for cancer and normal gene expression profiling and interactive analyses ReactomePA: an R/bioconductor package for reactome pathway analysis and visualization ClusterProfiler: an R package for comparing biological themes among gene clusters DOSE: an R/bioconductor package for disease ontology semantic and enrichment analysis STRING v10: proteinprotein interaction networks, integrated over the tree of life Cytoscape: a software environment for integrated models of biomolecular interaction networks NCBI GEO: archive for functional genomics data sets-update Gene expression profiling in human lung development: an abundant resource for lung adenocarcinoma prognosis Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes Lung cancer statistics Race/ethnicity and lung cancer survival in the United States: a meta-analysis Cancer statistics Distinct lung cancer subtypes associate to distinct drivers of tumor progression Decreased BMP-7 and p-Smad1/5/8 expression, and increased levels of gremlin in hepatocellular carcinoma MiR-199a-3p suppresses proliferation and invasion of prostate cancer cells by targeting Smad1 Activating transcription factor 4 modulates TGFbeta-induced aggressiveness in triple-negative breast cancer via SMAD2/3/4 and mTORC2 signaling Critical role of SMAD2 in tumor suppression and transforming growth factor-beta-induced apoptosis of prostate epithelial cells Keratinocyte-specific SMAD2 ablation results in increased epithelial-mesenchymal transition during skin cancer formation and progression High p-SMAD2 expression in stromal fibroblasts predicts poor survival in patients with clinical stage I to IIIA non-small cell lung cancer Smoking attenuates transforming growth factor-beta-mediated tumor suppression function through downregulation of Smad3 in lung cancer Investigating the mechanism by which SMAD3 induces PAX6 transcription to promote the development of non-small cell lung cancer Reduced expression of SMAD4 Is associated with poor survival in colon cancer SMAD4 expression predicts local spread and treatment failure in resected pancreatic cancer Expression pattern of p-SMAD2/SMAD4 as a predictor of survival in invasive breast ductal carcinoma MicroRNA-145 promotes esophageal cancer cells proliferation and metastasis by targeting SMAD5 Fstl1 promotes glioma growth through the BMP4/SMAD1/5/8 signaling pathway Nuclear Smad6 promotes gliomagenesis by negatively regulating PIAS3-mediated STAT3 inhibition High expression levels of SMAD3 and SMAD7 at diagnosis predict poor prognosis in acute myeloid leukemia patients undergoing chemotherapy Transforming growth factor-beta1 suppresses bone morphogenetic protein-2-induced mesenchymalepithelial transition in HSC-4 human oral squamous cell carcinoma cells via SMAD1/5/9 pathway suppression All authors have read this manuscript and approved for the submission.The authors have no conflicts of interest to disclose.Supplemental Digital Content is available for this article.The datasets generated during and/or analyzed during the current study are publicly available.