key: cord-0026613-s43v0p55 authors: Liu, Guanzhi; Luo, Sen; Lei, Yutian; Wu, Jianhua; Huang, Zhuo; Wang, Kunzheng; Yang, Pei; Huang, Xin title: A nine-hub-gene signature of metabolic syndrome identified using machine learning algorithms and integrated bioinformatics date: 2021-09-13 journal: nan DOI: 10.1080/21655979.2021.1968249 sha: ce674797c4d0e9d814f99910271e733d6cfc9bba doc_id: 26613 cord_uid: s43v0p55 Early risk assessments and interventions for metabolic syndrome (MetS) are limited because of a lack of effective biomarkers. In the present study, several candidate genes were selected as a blood-based transcriptomic signature for MetS. We collected so far the largest MetS-associated peripheral blood high-throughput transcriptomics data and put forward a novel feature selection strategy by combining weighted gene co-expression network analysis, protein-protein interaction network analysis, LASSO regression and random forest approaches. Two gene modules and 51 hub genes as well as a 9-hub-gene signature associated with metabolic syndrome were identified. Then, based on this 9-hub-gene signature, we performed logistic analysis and subsequently established a web nomogram calculator for metabolic syndrome risk (https://xjtulgz.shinyapps.io/DynNomapp/). This 9-hub-gene signature showed excellent classification and calibration performance (AUC = 0.968 in training set, AUC = 0.883 in internal validation set, AUC = 0.861 in external validation set) as well as ideal potential clinical benefit. Metabolic syndrome (MetS) is a complex abnormality with several components, such as insulin resistance, diabetes, obesity, hypertension, and hyperlipidemia [1, 2] . The occurrence and development of MetS and its components are always associated with poor cardiovascular outcomes, especially for individuals with obesity and insulin resistance, which are the core pathophysiological features of MetS [3] [4] [5] . The lack of effective risk assessment biomarkers makes early intervention for MetS and MetSrelated diseases difficult [6, 7] . Studies have reported potential biomarkers of MetS; however, there is still a lack of definitive clinical risk assessment biomarkers [8, 9] . Research on MetS biomarkers is limited to genomics, and the association between MetS and single nucleotide polymorphisms (SNPs) [10, 11] . Few studies have focused on MetS-specific biomarkers from a transcriptomics perspective [12] . In high-throughput transcriptomics, microarrays and next-generation sequencing (NGS) have been widely used to measure RNA expression levels [13] [14] [15] . In addition, advanced bioinformatics approaches, such as weighted gene coexpression network analysis (WGCNA), can play an important role in the identification of disease biomarkers, as they have high sensitivity, specificity, and efficiency, based on high-throughput transcriptomic data [16, 17] . Compared to traditional bioinformatics methods, such as differentially expressed gene (DEG) analysis, networkfocused algorithm WGCNA can establish a weighted scale-free co-expression network, and then identify key gene modules and hub genes [18] . Machine learning (ML), as a key aspect of artificial intelligence, has been increasingly applied in many biomedical fields, such as biomarker identification, diagnosis signature development, and drug target discovery [19, 20] . Moreover, some ML methods, such as least absolute shrink-age and selection operator (LASSO) regression and random forest (RF), can significantly improve biomarker development for multifactorial and complicated diseases [21, 22] . In this study, an integrated bioinformatic approach using WGCNA was performed on the largest MetS-associated peripheral blood highthroughput transcriptomic data set. Several hub genes were identified via protein-protein interaction (PPI) network analysis, and further hub gene feature selection was conducted by combining LASSO regression and RF algorithms. Finally, a logistic regression and a web nomogram calculator for MetS risk (https://xjtulgz.shinyapps.io/ DynNomapp/) was established based on the training set, and the diagnostic value of selected hub gene features was measured using internal and external validation data. To further detect the differences in hub gene expression in peripheral blood and plasma, NGS was carried out in plasma samples of MetS patients and a control group (healthy patients). The current study aimed to identify gene parameters with high diagnostic value and clinical implications for MetS, using comprehensive bioinformatics and ML feature selection methods. This study provides a novel strategy for more effective and reliable biomarker development. MetS causes highly specific gene expression changes in peripheral blood. Public gene expression datasets based on peripheral blood samples containing MetS-associated clinical diagnosis information were collected from the Gene Expression Omnibus database (GEO database; http://www.ncbi.nlm.nih.gov/geo/). The training set consisted of 70% of samples randomly selected from the GSE152073 (n = 90) and GSE98895 (n = 40) combined datasets (gene expression microarray data of peripheral blood), and the remaining 30% was used as internal validation data [23, 24] . GSE124534 (n = 17, gene expression microarray data of peripheral blood) was used for external validation [25] . Subjects diagnosed with other metabolic diseases or acute trauma, such as osteoporosis or femoral neck fracture, which may cause gene expression changes, were excluded. Detailed information on these datasets is listed in Supplementary Table 1 . After removing the outliers and probes that were duplicate or could not be annotated, gene expression data were normalized and batch effects removed using the 'limma' package in R. Missing data was imputed using the R software package 'impute.' WGCNA was performed based on GSE98895 datasets using the R package 'WGCNA' [26] . First, the Pearson's correlation was calculated for all pairs of genes to establish a similarity matrix. Second, an appropriate soft-thresholding power of two was selected to meet the scale-free topology (scale-free R2 > 0.9) criterion using the function 'pickSoftThreshold.' Third, a topological overlap matrix and corresponding dissimilarity matrix were constructed. Then, the 'blockwiseModules' function was run with the following major parameters: maxBlockSize = 5000, minModuleSize = 30, and mergeCutHeight = 0.25. Several gene modules were identified through hierarchical clustering with a dynamic tree-cutting algorithm. Finally, the correlation between gene modules and clinical phenotypes was calculated to identify clinically significant modules. To explore the function and signaling pathways associated with these modules, Gene Ontology (GO) function enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis were performed, as well as Gene Set Enrichment Analysis (GSEA) using the 'clusterProfiler,' 'enrichplot,' 'DOSE,' and 'ggplot2' packages in R software [27] . A P value of <0.05 was set as the threshold. A protein-protein interaction (PPI) network was constructed based on the STRING database (Search Tool for the Retrieval of Interacting Genes, version 11.0, combined score >0.4). Connectivity degrees in the network were then calculated, and the top 5% of genes with the highest connectivity degree were identified as hub genes for further analysis. Visualization of hub genes in the PPI network was achieved using Cytoscape software (version 3.7.0). Peripheral blood samples were obtained from five patients with MetS and five healthy volunteers from the First Affiliated Hospital of Xi'an Jiaotong University, defined using the World Health Organization's MetS definition. MetS diagnosis can be made based on the presence of impaired fasting glucose, impaired glucose tolerance, type 2 diabetes mellitus (T2DM) or insulin resistance, and two or more of the following [1] : waist-to-hip ratio > 0.90 in men; waist-to-hip ratio > 0.85 in women, and/or body mass index > 30 kg/m 2 [2] ; serum triglyceride level ≥ 1.7 mmol/L [3] ; HDL cholesterol < 0.9 mmol/L in men, < 1.0 mmol/L in women, or treatment for dyslipidaemia [4] ; blood pressure ≥ 140/ 90 mm Hg; and [5] microalbuminuria [28] . This study was approved by the Ethics Committee of the First Affiliated Hospital of Xi'an Jiaotong University (Ethical Approval number: XJTU1AF2019LSL-014). All participants provided written informed consent in advance. Total RNA was extracted from plasma samples using TRIzol LS Reagent (Invitrogen), according to the manufacturer's instructions. Sequencing libraries were generated using the NEBNext Poly(A) mRNA Magnetic Isolation Module (New England Biolabs), RiboZero Magnetic Gold Kit (Epicenter, Illumina Company), and KAPA Stranded RNA-Seq Library Prep Kit (Illumina). An Agilent Bioanalyzer 2100 system (Agilent) was used to qualify the sequencing libraries. Finally, high-throughput NGS was carried out using the TruSeq SR Cluster Kit (Illumina), based on the Illumina HiSeq 4000 sequencing platform (Illumina). The sequencing data has been uploaded to ArrayExpress database (E-MTAB-10494) . Trimmed reads were identified after raw sequencing data quality control and filtering using the Solexa pipeline program (version 1.8) and Cutadapt software. Subsequently, human reference genome indexing (hg38) was obtained using Bowtie (http://bowtie-bio.sourceforge.net/index. shtml). Sequence alignment was performed using the Hisat2 program. The R package 'edgeR' was used to detect DEGs [29] . The threshold for DEGs was set as |log2FC|≥ 1 and P value < 0.05. ML algorithms are more powerful than traditional methods for complex classification, like medical diagnosis and treatment. In this study, two ML approaches: LASSO regression and RF using R packages 'glmnet' and 'randomForest' were combined to achieve feature selection [30] . The feature selection was cross-checked, and several hub genes were selected according to the classification accuracy. Hub genes from LASSO regression and RF feature selection were further used to establish a diagnosis classifier. The R package 'rms' was used to establish a logistic regression model, based on expression data in the training set. A corresponding web nomogram calculator for MetS risk was constructed to visualize the diagnostic effect of the selected hub gene signature. Internal and external validations were then performed to determine the web nomogram calculator performance. The area under curve (AUC) value of the receiver operating characteristic (ROC) curve was calculated using the 'pROC' package in R, which can depict the classification ability [31] . The Hosmer-Lemeshow goodness-offit test and a calibration curve analysis were conducted to indicate the calibration. In addition, a decision curve analysis was carried out using the 'rmda' package to evaluate the clinical application value and net benefit of the nomogram. In this study, through combined integrated bioinformatic approaches and machine learning algorithms, we identified a nine-hub-gene signature with high diagnostic value and clinical implications for MetS. Besides, current work provides a novel strategy for more effective and reliable biomarker development. The workflow of this study is shown in Figure 1 . The most comprehensive sets of MetS-associated high-throughput transcriptomic data from the GEO database were combined (Supplementary Table 1 ). Gene expression profiles from GSE98895 were used to perform WGCNA. After preprocessing and batch effect removal, 25,148 gene expression data were identified from peripheral blood samples from 20 MetS and 20 control patients. Sample-clustering analysis, based on Pearson's correlation and average linkage approaches, showed no outliers (Figure 2a) . To achieve scale-free topology (scale-free R2 > 0.9), a soft-thresholding power β = 2 was selected (Figure 2b ). Subsequent WGCNA network construction and average linkage hierarchical clustering detected 14 gene modules. Detailed hierarchical clustering information is shown in Figure 2c ,d. The correlation analysis between these modules and MetS showed that the red module (618 genes) and black module (546 genes) were highly associated with MetS ( Figure 2e) . Hence, these two modules were identified as the key modules of MetS for further analysis. Scatter diagrams containing key module GS and MM information are shown in Figure 2f ,g. The GO functional enrichment analysis showed that the MetS-associated genes in red and black modules were mainly enriched in biological processes (BP), such as the receptor guanylyl cyclase signaling pathway, central nervous system neuron differentiation, response to calcium ion, and platelet activation. In addition, these genes were associated with molecular functions (MF), such as tumor necrosis factor receptor and lipid transporter activity. Cellular components (CC), such as cellular junctions and guanyl-nucleotide exchange factor complexes, may correlate with the development of MetS. The KEGG signaling pathway enrichment analysis indicated that these genes were significantly enriched in signaling pathways, such as cell adhesion molecules, leukocyte transendothelial migration, and the calcium signaling pathway (Figure 3a,b) . In addition, GSEA further revealed the function and signaling pathways of these genes, and showed a similar result to CC GO and KEGG pathway enrichment analysis. BP GSEA and MF GSEA suggested that BP, such as regulation of lymphocyte activation, drug metabolic processes, and MF, such as lyase activity, hydrolase activity, molecular transducer activity, and G-protein coupled receptor activity, might be involved in the development of MetS (Figure 3c-f ). PPI networks were established using the STRING database, based on genes in the red and black modules. The degree of connectivity was calculated, and the top 5% of genes (51 genes) with the highest connectivity were selected as hub genes associated with MetS. Hub genes with a high degree of connectivity, such as MYC, UBE2E2, MIB2, ANAPC1, TCEB1, CTLA4, and SPI1, might play important roles in the development of MetS, and could serve as potential biomarkers and therapeutic targets. The visualization of the hub gene PPI network is shown in Figure 4 . These 51 hub genes (Supplementary Table 2) were used for further feature reduction analysis and model construction. 12,954 genes were identified in plasma samples from five patients in the MetS group and five patients in the control group, and 45 upregulated and 186 downregulated DEGs were identified in the MetS group compared to the control (Supplementary Table 3 ). The plasma expression of the 51 hub genes did not differ significantly between MetS patients and healthy controls (Supplementary Table 4) . These results indicate that the potential function and diagnostic value of these 51 hub genes in peripheral blood should be determined, instead of in plasma components. This outcome defines the sampling type for further noninvasive MetS screening or diagnostic tools. In this study, LASSO regression analysis and RF were used for feature selection. The expression data of the 51 hub genes were entered into LASSO regression models, and a 10-fold cross-validation was performed to detect the optimal classification accuracy (Figure 5a,b) . Hence, 15 hub gene features were obtained based on LASSO regression analysis, including ADRA2A, CXCR5, FZD1, HLA.DPA1, HSPA5, KCTD7, KLHL9, P2RY14, P2RY2, PRKACG, PSMD1, PTTG1 , REEP4, SPTAN1, and TSPAN14. In addition, an RF model was constructed using the expression profiles of the 51 hub genes, and the classification importance of hub gene features was measured by the decrease in the Gini coefficient (MeanDecreaseGini). Fifteen hub gene features were chosen using an RF approach, comprising SPTAN1, KCTD7, IL2RG, ITPR3, PSMD1, ITGB7, FZD1, DCTN4, KLHL9, PTTG1, TSPAN14, RNF19B, XCR1, P2RY2, and CXCR5 ( Figure 5c ). Finally, the results of these two gene feature selection methods were combined by taking the intersection, and nine-hub-gene features (SPTAN1, KCTD7, PSMD1, FZD1, KLHL9, PTTG1, TSPAN14, P2RY2, and CXCR5) were selected for further analysis. Based on Human Protein Atlas database, the mRNA blood cell type distribution and protein concentration in plasma of these nine hug genes were showed in Supplementary Table 5 . The expression profiles of the nine selected gene features were entered into a logistic regression, and then, to validate the diagnostic value of this nine-hub-gene signature, a web nomogram calculator for MetS risk was established based on the training set (https://xjtulgz.shinyapps.io/ DynNomapp/). The ROC curve analysis (Figure 6a ) showed that this MetS diagnostic nomogram had excellent classification ability (AUC = 0.968 in training set, AUC = 0.883 in internal validation set, AUC = 0.861 in external validation set). The ROC curves of every hub gene are shown in Supplementary Figure 1 . In addition, a calibration curve analysis was performed, and the Hosmer-Lemeshow goodness-of-fit test (P= 0.915) showed good calibration of this nomogram (Figure 6b) . Furthermore, the decision curve plotted the standardized net benefit of the MetS diagnostic nomogram for different decision thresholds (Figure 6c) . These results indicate that the application of this MetS diagnostic nomogram can lead to ideal diagnostic outcomes. Recently, considerable amount of research has been conducted on MetS; however, early diagnosis and intervention remains difficult because of a lack of effective biomarkers and targeted treatment [32] . To the best of our knowledge, this is the first study to identify a key gene module and 51 MetS-associated hub genes by combining a WGCNA bioinformatics approach and PPI network analysis. Genes in this key module were mainly enriched in signaling pathways, such as cell adhesion, leukocyte transendothelial migration signaling, nuclear factor kappa B (NF-κB), and functions such as lymphocyte activation. These 51 hub genes may play important roles in the development of MetS. Cheung et al. suggested that MYC (MYC proto-oncogene) can serve as an important mediator of impaired insulin secretion and β-cell apoptosis [33] . Some studies have indicated that the SNPs in UBE2E2 (ubiquitin conjugating enzyme E2) are associated with the development of T2DM [34, 35] . Additionally, mindbomb E3 ubiquitin protein ligase 2 (MIB2), anaphase promoting complex subunit 1 (ANAPC1), and ELOC, Elongin C (TCEB1), are also involved in ubiquitination, which can affect the development of insulin resistance and MetS [36, 37] . Cytotoxic T-lymphocyte associated protein 4 (CTLA4) is involved in T-cell immune responses, and thus, it regulates the pathogenesis of insulin resistance and insulindependent diabetes mellitus [38, 39] . Moreover, the upregulation of Spi-1 proto-oncogene (SPI1, or PU.1) in adipocytes can cause insulin resistance by stimulating reactive oxygen species production and inflammatory cytokine gene expression [40, 41] . These hub genes could serve as biomarkers for MetS and many of their contributing components. Through ML feature selection methods, a ninehub-gene signature with high diagnostic value and clinical implications for MetS was obtained. Dhana et al. found that the proteasome 26S subunit, non-ATPase (PSMB1) gene was associated with both body mass index and waist circumference, and could serve as a biomarker for obesity-related diseases [42] . Some studies have shown that frizzled class receptor 1 (FZD1) is related to insulin resistance [43, 44] . In addition, Kelch-like family member 9 (KLHL9) can induce insulin resistance by regulating insulin receptor substrate-1 (IRS1) degradation [45] . Pituitary tumor-transforming gene 1 (PTTG1) is a crucial factor in the development and physiological responses of pancreatic beta-cells, and its dysregulation can result in diabetes [46] . Tetraspanin 14 (TSPAN14) can interact with ADAM metallopeptidase domain 10 (ADAM10) and then regulate leukocyte development and inflammatory immunity function [47] . Previous studies have demonstrated that purinergic receptor (P2Y2) contributes to the development of chronic high-fat diet-induced metabolic dysfunction and insulin resistance [48, 49] . Furthermore, P2RY2 is involved in the process of immune cell infiltration in MetS [50] . Follicular helper T-cells (Tfh) of diabetic patients express elevated levels of C-X-C motif chemokine receptor 5 (CXCR5), and there is a dysregulation of circulating CD4+ CXCR5 + T-cells in diabetes patients [51, 52] . These results indicated nine-hub-gene signature is highly associated with MetS. Finally, the classification ability, calibration, and potential clinical benefit of the bloodbased, nine-hub-gene signature was verified in internal and external validation sets. Previous studies have not investigated the diagnostic value of these nine hub genes for MetS; an early screening or diagnostic tool for MetS has not been developed [53] . However, in this study, the blood-based, nine-hub-gene signature combined with logistic regression and visualized as a nomogram produced an excellent classification and calibration performance. The AUC of the ROC curves reached 0.883 in the internal validation set and 0.861 in the external validation set. The Hosmer-Lemeshow goodness-of-fit test (P = 0.915) showed good calibration. A further decision curve analysis showed that this nomogram has a better net benefit than any single gene signature in almost all decision threshold ranges. Overall, these results indicate that this nine-hub-gene signature is useful for MetSassociated, blood-based risk assessments in clinical applications. In this study, the largest MetS-associated peripheral blood high-throughput transcriptomics dataset was collected. However, further large, independent patient cohort validation studies are still needed to establish a diagnostic model for clinical applications. Because of its excellent classification ability, calibration, and potential clinical benefits, the nine-hub-gene signature identified in the present study can be used to accurately assess MetS risk. In addition, a novel risk assessment biomarker selection method is proposed by combining WGCNA approaches, PPI network analysis, LASSO regression, and RF feature selection algorithms. In addition, high-throughput sequencing was performed to detect the plasma cell-free mRNA expression level in MetS patients compared with healthy controls, which can provide a reliable basis for sampling type in MetS risk assessment. (1) Combining bioinformatics analysis and machine learning algorithms (2) Providing a novel strategy for biomarker identification (3) A nine-hub-gene signature with high diagnostic value for MetS Rho kinase, a potential target in the treatment of metabolic syndrome Extracellular vesicles in metabolic syndrome Tissue-specific insulin signaling, metabolic syndrome, and cardiovascular disease The insulin-like growth factor i system: physiological and pathophysiological implication in cardiovascular diseases associated with metabolic syndrome Identification of hub genes associated with obesity-induced hepatocellular carcinoma risk based on integrated bioinformatics analysis Discovery of metabolite profiles of metabolic syndrome using untargeted and targeted LC-MS based lipidomics approach Insulin resistance and insulin hypersecretion in the metabolic syndrome and type 2 diabetes: time for a conceptual framework shift A correlation-based network for biomarker discovery in obesity with metabolic syndrome Water T2 as an early, global and practical biomarker for metabolic syndrome: an observational cross-sectional study Identification of female-specific genetic variants for metabolic syndrome and its component traits to improve the prediction of metabolic syndrome in females Multiple genotype-phenotype association study reveals intronic variant pair on SIDT2 associated with metabolic syndrome in a Korean population Evaluation of transcriptomic regulations behind metabolic syndrome in obese and lean subjects Serum miRNA-based prediction of axillary lymph node metastasis in breast cancer A bioinformatics approach for identifying potential molecular mechanisms and key genes involved in COVID-19 associated cardiac remodeling A data mining paradigm for identifying key factors in biological processes using gene expression data Identification of potential miRNA biomarkers for traumatic osteonecrosis of femoral head Identification of candidate biomarkers for salt sensitivity of blood pressure by integrated bioinformatics analysis Whole-transcriptome sequencing uncovers core regulatory modules and gene signatures of human fetal growth restriction Machine learning in medicine Applications of deep learning in biomedicine Establishment and validation of a mathematical diagnosis model to distinguish benign pulmonary nodules from early non-small cell lung cancer in Chinese people Machine learning guided adjuvant treatment of head and neck cancer Overexpression of SNTG2, TRAF3IP2, and ITGA6 transcripts is associated with osteoporotic vertebral fracture in elderly women from community Identification of miR-9-5p as direct regulator of ABCA1 and HDL-driven reverse cholesterol transport in circulating CD14 + cells of patients with metabolic syndrome Integrated analys of high-fat challenge-induced changes in blood cell whole-genome gene expression HS. An R package for weighted correlation network analysis. BMC Bioinf ClusterProfiler: an R package for comparing biological themes among gene clusters. Omi A Comment on the provisional report from the WHO consultation edgeR: a Bioconductor package for differential expression analysis of digital gene expression data Regularization paths for generalized linear models via coordinate descent pROC: an open-source package for R and S+ to analyze and compare ROC curves Metabolic syndrome and oxidative stress: a complex relationship c-Myc directly induces both impaired insulin secretion and loss of βcell mass, independently of hyperglycemia in vivo Association between UBE2E2 variant rs7612463 and type 2 diabetes mellitus in a Chinese han population A genome-wide association study in the Japanese population identifies susceptibility loci for type 2 diabetes at UBE2E2 and C2CD4A-C2CD4B Role of E3 ubiquitin ligases in insulin resistance The E3 Ubiquitin ligase pellino3 protects against obesity-induced inflammation and insulin resistance Antigen presentation and T-cell activation are critical for RBP4-induced insulin resistance Innate immune activity as a predictor of persistent insulin secretion and association with responsiveness to CTLA4-Ig treatment in recent-onset type 1 diabetes Inhibition of PU.1 ameliorates metabolic dysfunction and non-alcoholic steatohepatitis Adipocyte expression of PU.1 transcription factor causes insulin resistance through upregulation of inflammatory cytokine gene expression and ROS production An epigenome-wide association study of obesity-related traits Evidence of impaired adipogenesis in insulin resistance Wnt signaling genes in adipose tissue and skeletal muscle of humans with different degrees of insulin sensitivity Deficiency of the autophagy gene ATG16L1 induces insulin resistance through KLHL9/KLHL13/ CUL3-mediated IRS1 degradation Transcriptional profiling reveals functional links between RasGrf1 and Pttg1 in pancreatic beta cells Regulation of leukocytes by TspanC8 tetraspanins and the "molecular scissor receptor promotes acute inflammation but is dispensable for chronic high-fat diet-induced metabolic dysfunction P2Y2 receptor promotes high-fat diet-induced obesity Purinergic receptor Y2 (P2Y2)-dependent VCAM-1 expression promotes immune cell infiltration in metabolic syndrome Follicular helper T cell signature in type 1 diabetes Find the latest version : follicular helper T cell signature in type 1 diabetes Dysregulation of circulating CD4+CXCR5+ T cells in type 2 diabetes mellitus Blood-based biomarkers for metabolic syndrome We would like to thanks all participants for their commitment and cooperation. Conception and design: Xin Huang, Pei Yang, Kunzheng Wang, Guanzhi Liu; collection and assemble of data: Yutian Lei, Zhuo Huang, Sen Luo; analysis and interpretation of the data: Guanzhi Liu,Jianhua Wu; draft of the article: Guanzhi Liu; All authors read, critically revised and approved the final manuscript. The data that support the findings of the this study are available from the corresponding author on reasonable request. The datasets for this study can be found in the Gene Expression Omnibus (GEO) database [https://www. ncbi.nlm.nih.gov/geo/]. No potential conflict of interest was reported by the author(s).