CogNet: classification of gene expression data based on ranked active-subnetwork-oriented KEGG pathway enrichment analysis


CogNet: classification of gene expression
data based on ranked active-subnetwork-
oriented KEGG pathway enrichment
analysis
Malik Yousef1,2, Ege Ülgen3 and Osman Uğur Sezerman3

1 Galilee Digital Health Research Center (GDH), Zefat Academic College, Zefat, Israel
2 Department of Information Systems, Zefat Academic College, Zefat, Israel
3 Department of Biostatistics and Medical Informatics, School of Medicine, Acibadem Mehmet Ali
Aydinlar University, Istanbul, Turkey

ABSTRACT
Most of the traditional gene selection approaches are borrowed from other fields such
as statistics and computer science, However, they do not prioritize biologically
relevant genes since the ultimate goal is to determine features that optimize model
performance metrics not to build a biologically meaningful model. Therefore, there is
an imminent need for new computational tools that integrate the biological
knowledge about the data in the process of gene selection and machine learning.
Integrative gene selection enables incorporation of biological domain knowledge
from external biological resources. In this study, we propose a new computational
approach named CogNet that is an integrative gene selection tool that exploits
biological knowledge for grouping the genes for the computational modeling tasks
of ranking and classification. In CogNet, the pathfindR serves as the biological
grouping tool to allow the main algorithm to rank active-subnetwork-oriented
KEGG pathway enrichment analysis results to build a biologically relevant model.
CogNet provides a list of significant KEGG pathways that can classify the data with a
very high accuracy. The list also provides the genes belonging to these pathways that
are differentially expressed that are used as features in the classification problem.
The list facilitates deep analysis and better interpretability of the role of KEGG
pathways in classification of the data thus better establishing the biological relevance
of these differentially expressed genes. Even though the main aim of our study is not
to improve the accuracy of any existing tool, the performance of the CogNet
outperforms a similar approach called maTE while obtaining similar performance
compared to other similar tools including SVM-RCE. CogNet was tested on 13 gene
expression datasets concerning a variety of diseases.

Subjects Bioinformatics, Data Science
Keywords Classification, Gene expression, Enrichment analysis, KEGG pathway, Rank,
Machine learning, Bioinformatics, Data science, Data mining, Genomics

INTRODUCTION
Due to recent advances in DNA gene expression technology, it is now feasible to obtain
gene expression profiles of tissue samples at relatively low costs. Data from genome-wide
gene expression analyses are helping scientists and physicians understand the disease

How to cite this article Yousef M, Ülgen E, Uğur Sezerman O. 2021. CogNet: classification of gene expression data based on ranked active-
subnetwork-oriented KEGG pathway enrichment analysis. PeerJ Comput. Sci. 7:e336 DOI 10.7717/peerj-cs.336

Submitted 17 September 2020
Accepted 23 November 2020
Published 22 February 2021

Corresponding author
Malik Yousef,
malik.yousef@zefat.ac.il

Academic editor
Faizal Khan

Additional Information and
Declarations can be found on
page 18

DOI 10.7717/peerj-cs.336

Copyright
2021 Yousef et al.

Distributed under
Creative Commons CC-BY 4.0

http://dx.doi.org/10.7717/peerj-cs.336
mailto:malik.�yousef@�zefat.�ac.�il
https://peerj.com/academic-boards/editors/
https://peerj.com/academic-boards/editors/
http://dx.doi.org/10.7717/peerj-cs.336
http://www.creativecommons.org/licenses/by/4.0/
http://www.creativecommons.org/licenses/by/4.0/
https://peerj.com/computer-science/


mechanisms and use that information to design platforms to assist in diagnosis, to
assess prognosis, and to inform treatment plans. For instance, a study by Van ’t Veer et al.
(2002) collected gene expression data on primary breast tumors of 117 young patients.
Machine learning with feature selection was used to identify a gene expression signature
strongly predictive of a short interval to distant metastases (“poor prognosis” signature),
even in patients that were lymph node negative.

Gene expression technologies are now producing large datasets associated with a
variety of diseases. Due to the high dimensionality of the data and relatively small sample
sizes, reliable interpretation of the data is a complicated and often overwhelming, and
this is an important problem in bioinformatics research. Although sample sizes have
continued to grow in recent years, new and efficient feature selection algorithms are still
needed to overcome challenges in the existing methods (Vanjimalar, Ramyachitra &
Manikandan, 2018), in order to achieve the full potential of this data in the development of
gene-based diagnostic tests, drug discovery and therapeutic strategies for improving public
health.

Most of the traditional gene selection approaches are borrowed from other fields such
as statistics and computer science. There is a need for new computational tools that
integrate the biological knowledge about the data in the process of gene selection and
classification. Integrative gene selection incorporates biological domain knowledge to
selection processes from external biological resources. The main aim of integrative gene
selection is to generate a ranked list of features that provides high model performance
and takes into consideration both statistical metrics applied on the gene expression data
and the biological background information provided as external datasets. For example,
biological background information may be Gene Ontology (GO) where it provides for
each gene its product as Cellular Components (CC), Molecular Functions (MF), and
Biological Processes (BP). GO is a way to capture biological knowledge in a computable
form that consists from a set of concepts and their relationships with each other.

The various methods that have been applied to the process of selecting disease-specific
features from large gene expression datasets were reviewed recently (Pan, 2002; Lazar
et al., 2012) and fall into three major categories: “filters”, “wrappers”, and “embedded
approaches”. Briefly, the filter approach, not based on any machine learning algorithm,
uses a statistic (ANOVA, t-test, etc.), wrappers use learning techniques to evaluate
which features are useful, and embedded techniques combine the feature selection step
and classifier construction. Pan (2002) recently compared different filtering methods,
highlighting similarities and differences between three main methods: the t-test, a
regression modeling approach, and a mixture model approach. Additional comparisons
of filtering techniques are available in Lazar et al. (2012). Inza et al. (2004) also carried out
a comparison between a filter metrics and a wrapper sequential search procedure applied
on gene expression datasets.

Integrative approaches become important topics (Bellazzi & Zupan, 2007; Fang,
Mustapha & Sulaiman, 2014) in the emerging field of gene expression. GO (Ashburner
et al., 2000) was used by Qi & Tang (2007) for genes ranking based on not only their
individual discriminative powers but also the powers of biological information contained

Yousef et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.336 2/20

http://dx.doi.org/10.7717/peerj-cs.336
https://peerj.com/computer-science/


in GO annotations. The algorithm is an iterative algorithm that starts by applying
Information Gain (IG) to compute discriminative scores for each gene. Genes with a score
of zero are removed from the analysis. The second step is to integrate the biological
knowledge by annotating those surviving genes with GO term. The third step is to
score the GO terms as the mean of its associated gene IG score. Move the gene with the
highest IG from the GO term with the highest score, to the final list. This procedure is
repeated until the final goal is reached.

SoFoCles (Papachristoudis, Diplaris & Mitkas, 2010) is an interactive tool that
enables semantic feature filtering in microarray classification problems with the use of
external biological knowledge retrieved from the Gene Ontology. SoFoCles involves the
calculation of semantic similarities between two feature sets in order to derive an enriched,
semantically-aware final feature set. The GO terms are used in order to give a similarity
score for each annotated gene.

Fang, Mustapha & Sulaiman (2014) proposed an integrative gene selection based on
filter method and association analysis for selecting genes that are not only differentially
expressed but also informative for classification. Association analysis was employed to
integrate microarray data with Gene Ontology (GO) and KEGG Pathways (KEGG)
simultaneously. The performance of the integrative models verified the efficiency and
scalability of association analysis in mining microarray data.

An additional study that integrated KEGG with genetic meta information (DisGeNET
(Piñero et al., 2019)) was proposed by Raghu et al. (2017). Their approach was a two-step
analytical workflow that incorporates a new feature selection paradigm as the first step
and that utilizes graphical causal modeling as the second step to handle the automatic
extraction of causal relationships.

Quanz, Park & Huan (2008) apply the method of pathways-as-features using the
KEGG pathway database for the pathway extraction component and global test method
for the pathway selection component. The genes in each pathway are then transformed
into one single feature by mean normalization or logistic regression. The number of
features of the transformed data is the number of pathways. For instance, for the diabetes
data for which 17 pathways are selected, the dimensionality is reduced from 22, 283 to
17 for the classification task.

Unsupervised gene selection using biological knowledge-based GO terms was suggested
by Acharya, Saha & Nikhil (2017). They have utilized gene annotation data, where each
gene is represented as a structural information content (IC) based gene-GO term
annotation vector which intuitively forms a gene-GO term annotation matrix for a selected
data set. IC is the information content of a GO term is related to how often the term is
applied to genes in the database.

A very interesting study that emphasizes the need for an integrative approach
was conducted by Perscheid, Grasnick & Uflacker (2019). Their work compared the
performance of traditional and integrative gene selection approaches. Moreover,
they propose a straightforward approach to integrate external knowledge with traditional
gene selection approaches. The framework enables automatic external knowledge

Yousef et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.336 3/20

http://dx.doi.org/10.7717/peerj-cs.336
https://peerj.com/computer-science/


integration, gene selection, and evaluation. The study shows that the integration of external
knowledge improves overall analysis results.

Feature selection and discovering the molecular explanation of disease describe the
same process, where the first one is a computer science term and the second one is used in
the biomedical sciences.

Several tools are now available that allow users to break the fixed set paradigm in
assessing the statistical enrichment of sets of genes. In this regard, the gene set enrichment
analysis is a very important method. Recently, different approaches were developed and
become useful tools in gene expression analysis (Cohn-Alperovich et al., 2016; Ulgen,
Ozisik & Sezerman, 2019). PathfindR (Ulgen, Ozisik & Sezerman, 2019) is a tool for
pathway enrichment analysis utilizing active subnetworks (An active subnetwork can be
defined as a group of interconnected genes in a protein-protein interaction network
(PIN) that predominantly consists of significantly altered genes). It identifies gene sets that
form active subnetworks in a protein-protein interaction network using a list of genes
provided by the user. It then performs pathway enrichment analyses on the identified gene
sets. In most enrichment approaches, relational information captured in the graph
structure of a PIN is overlooked as genes in the network neighborhood of significant genes
are not taken into account. The approach pathfindR uses for exploiting interaction
information to enhance pathway enrichment analysis is active subnetwork search. Briefly,
active subnetwork search enables inclusion of genes that are not significant genes
themselves but connect significant genes. This results in the identification of phenotype-
associated connected significant subnetworks. Initially identifying active subnetworks in
a list of significant genes and then performing pathway enrichment analysis of these active
subnetworks efficiently, pathfindR exploits interaction information between the genes.
This, in turn, helps pathfindR uncover relevant mechanisms underlying the disease.

Support Vector Machines-Recursive Cluster Elimination (SVM-RCE) is a machine
learning algorithm based on grouping/clustering gene expressions for scoring each cluster
of genes (Yousef et al., 2007). Interest in this approach has grown over time and a number
of publications based on SVM-RCE that have successfully applied this approach
to identifying those features directly associated with a disease/condition are being
published. This increasing interest is based on a reconsideration of how feature selection in
biological datasets can benefit from considering the biomedical relationships of the
features in the selection process. The usefulness of SVM-RCE then led to the development
of additional computational tools. Similar studies for SVM-RCE were carried out
(Harris & Niekerk, 2018; Lazzarini & Bacardit, 2017) indicating the importance of the
merit of SVM-RCE approach. The study of Deshpande et al. (2010) is a derivative of
SVM-RCE algorithm with small modification for disease state prediction. Additionally,
they have used our invented term “recursive cluster elimination”.

Most interestingly, the study of Zhao, Wang & Chen (2017) has used the SVM-RCE tool
for comparison tasks applied for detection on expression profiles for identifying
microRNAs related to venous metastasis in hepatocellular carcinoma.

SVM-RNE (Yousef et al., 2009) is a similar approach to SVM-RCE, and uses the GXNA
(Nacu et al., 2007) tool to extract the genes networks from the gene expression data.

Yousef et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.336 4/20

http://dx.doi.org/10.7717/peerj-cs.336
https://peerj.com/computer-science/


Those networks serve as groups/clusters of genes that are subject to the rank
procedure. A similar study to SVM-RNE was carried out by Johannes et al. (2010) for the
integration of pathway knowledge into a reweighted recursive feature elimination
approach for risk stratification of cancer patients.

The term knowledge-driven variable selection (KDVS) is a similar term of integration of
biological knowledge in the process of feature selection. In Zycinski et al. (2013), the
authors proposed a KDVS framework, which uses a priori biological knowledge in
highthroughput data analysis, and applied this framework to SVM-RNE.

The most recent tool that integrates biological knowledge for grouping the genes was
maTE (Yousef, Abdallah & Allmer, 2019), which uses the same approach based on the
interactions of microRNAs (miRNA) and their gene targets. The maTE approach is
different from SVM-RCE and SVM-RNE in that it integrates additional input to the
algorithm which is the information about miRNA and its target set.

The benefit of integration of biological knowledge led us to suggest a new tool called
CogNet, that integrates biological knowledge derived from integrating the pathfindR tool into
an integrative approach. In CogNet, the pathfindR tool serves as the biological grouping
function allowing the main algorithm to rank active-subnetwork-oriented KEGG pathway
enrichment analysis results. The details of the tool will be described in the following sections.

MATERIALS AND METHODS
The computational tool CogNet that we developed is based on the concept of integration of
biological knowledge with machine learning in order to perform two tasks: the first task is
ranking the groups of genes (in this case, pathway genes) and then use the top groups
(significant groups) to build a machine learning model. Figure 1 displays the general

Figure 1 General workflow for integrating biological information for grouping the genes by bioF()
function. BioF() could be microRNA targets association, KEGG pathway association or other association.

Full-size DOI: 10.7717/peerj-cs.336/fig-1

Yousef et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.336 5/20

http://dx.doi.org/10.7717/peerj-cs.336/fig-1
http://dx.doi.org/10.7717/peerj-cs.336
https://peerj.com/computer-science/


workflow of integrating biological information for grouping the genes by a biological
grouping functions. The CogNet components meet the general approach of the integration
of biological knowledge with machine learning as described in Fig. 1. In fact, the tools
SVM-RCE, SVM-RNE, and maTE also fit the general approach described in Fig. 1.

Let us assume that we are given a two-class data D, which consists of k samples and
n genes. Let us assume that the biological grouping function groups the n genes into m
groups as following: bioF(g1,g2,…,gn) = {grp1,grp2,…,grpm}. bioF() can also assign some
genes to a specific group that contains genes about which there is no biological knowledge.

The bioF() function is used in order to group/cluster the genes using biological
information that could be associated with a specific biological concept. For example, bioF()
could group genes according to their miRNA targets (as the tool maTE did), or might
be according to disease, meaning that groups of genes are associated with specific diseases.

The ranking step is based on the machine learning algorithm used. In order to estimate
the significance of each group grpi of genes, the following algorithm is applied:

1. Create a new data D� that contains only genes from the grpi.

2. Apply cross validation using the ML algorithm.

3. Assign a score to grpi. The score is the average of the performance metric (could be
accuracy, the area under the curve, f-measure, etc.)

pathfindR
In this section, we describe the pathfindR tool that serves as the bioF() function (Fig. 1) in
the CogNet tool.

Active-subnetwork-oriented KEGG pathway enrichment analysis of the proteins was
conducted using the R package pathfindR (Ulgen, Ozisik & Sezerman, 2019). Active
subnetworks are subnetworks within the PIN (BioGRID, by default) that have a locally
maximal score (based on the provided significance values). Active subnetworks define
distinct disease-associated sets of interacting genes, whether discovered through the
original analysis or discovered because of being in interaction with a significant gene.

The workflow of pathfindR is presented in Fig. 2. After processing the input (to filter the
differential expression results, the p value threshold was chosen as 0.2), pathfindR maps
the input genes onto a PIN. Using the mapped genes, an active subnetwork search
(with the greedy approach as default) is performed. The resulting active subnetworks are
then filtered based on their scores and the number of significant genes they contain.
This filtered list of active subnetworks is then used for enrichment analyses (over-
representation analysis via hypergeometric-distribution-based tests), that is, using the
genes in each of the active subnetworks, the significantly enriched pathways are identified.
Enriched pathways with adjusted p values larger than the given threshold (0.2 was used)
are discarded and the lowest adjusted p-value (overall active subnetworks) for each
term is kept. This process of “active subnetwork search + enrichment analyses” is
repeated for a selected number of iterations (default is 10), performed in parallel. Over
all iterations, the lowest and the highest adjusted-p values, as well as the number of
occurrences over all iterations are reported for each significantly enriched pathway.

Yousef et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.336 6/20

http://dx.doi.org/10.7717/peerj-cs.336
https://peerj.com/computer-science/


The CogNet tool
The CogNet algorithm is based on the general approach of integrating biological
information for grouping the genes as described in Fig. 1. Overall, the tool is composed of
two components. The first component is the pathfindR step that is serving to generate the
groups of genes, which are enriched KEGG pathways. Then, the second component is
applied to rank those groups in terms of their contribution to separate the two-class data D.
The workflow of the tool is presented in Fig. 3.

The CogNet starts by splitting the data into two parts. The first part is the training
part that is used in order to rank the pathways produced by the tool pathfindR. The test
part is utilized at the end of the algorithm in order to estimate the performance of the
CogNet.

A list of genes and their p-values represented in a table is created to serve as input to
pathfindR (Table 1). The list is computed using Student’s t-test to assign for each gene its
differential expression significance (p-value). The pathfindR tool is invoked to create n

Figure 2 Flow diagram of the pathfindR active-subnetwork-oriented enrichment analysis approach.
Image credit: Coort S, Hanspers K, Waagmeester A, Defay A et al. (https://www.wikipathways.org/index.
php/Pathway:WP1403), CC0 1.0 Universal (CC0 1.0). Full-size DOI: 10.7717/peerj-cs.336/fig-2

Figure 3 CogNet workflow. Full-size DOI: 10.7717/peerj-cs.336/fig-3

Yousef et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.336 7/20

https://www.wikipathways.org/index.php/Pathway:WP1403
https://www.wikipathways.org/index.php/Pathway:WP1403
http://dx.doi.org/10.7717/peerj-cs.336/fig-2
http://dx.doi.org/10.7717/peerj-cs.336/fig-3
http://dx.doi.org/10.7717/peerj-cs.336
https://peerj.com/computer-science/


significant KEGG pathways, from now on, we refer to this as n groups of genes grp1,grp2,…,
grpn. As a preprocess to the step of the ranking, each KEGG pathway is producing one data
set out of the final data D that containing only genes of the specific pathway. Thus, n
datasets D�i i = 1,2,…n, (with |grpi| as the number of genes) are created to be subject for the
Rank step. Table 2 presents an example input table to Rank step.

The Rank step is computed as a Monte Carlo cross-validation (MCCV), where for
each D�i a score is assigned as the mean of computing accuracy of r iteration of splitting
the data into two parts one for training and second for testing (for example, 80% training
and 20% testing) applying random forest (or another ML algorithm such as SVM).
The output of the stage is a sorted list of KEGG pathways according to the score assigned in
the Rank step. Let refer to this list as grp�1, grp�2, …, grp�n. Table 3 presents an example of the
output of Rank step.

The final step is to compute the performance of the CogNet by training the RF on top
pathways and test on the main test data (from the first stage of the algorithm).

� gene_list={}
� for i = 1 to n:

a. gene_list = gene_list ∪ grp�n
b. create Tr to be the data DTrain with just genes belongs to gene_list.

Table 1 Example of input table to the tool pathfindR.

Gene p-value

SCARNA15 0.015915

CACNA1D 0.015915

IP6K3 0.015915

MIR1304 0.015915

DLX2 0.015915

SNORD77 0.015915

SGCG 0.015915

TCF21 0.015915

RPS21 0.021748

Note:
The table consist from list of genes and its p-values.

Table 2 Example of input table to the Rank step.

ID p-value PathWay Genes

hsa05165 8.79E−06 FZD7, THBS4, COL4A1, COL9A2, TNC, ITGA8, PTK2, PPP2CA, CCND2, IKBKB, ATP6V1G2, ATP6V0A2, JAG1,
MAML3

hsa04151 1.40E−05 FGF2, IGF1, FLT3LG, ANGPT1, COL4A1, COL9A2, THBS4, TNC, ITGA8, PTK2, LPAR1, GNB1, MLST8, RPS6, PPP2CA,
CCND2, IKBKB

hsa04062 2.14E−05 CCL18, ADCY4, ADCY5, IKBKB, ROCK2, GNB1, PTK2, PRKCB, GRK5

hsa03010 1.82E−04 RPS6, RPS12, RPS21, RPS29, RPL17-C18orf32, RPL18, RPL22L1, RPL31

hsa05170 4.60E−05 PTK2, PRKCB, GNB1, IKBKB, CCNB3

Note:
The list consists from three columns, ID is the KEGG Pathway id, p-value is the p-value of the pathways computed by pathfidR while the last column is the list of genes
belonging to the KEGG pathway.

Yousef et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.336 8/20

http://dx.doi.org/10.7717/peerj-cs.336
https://peerj.com/computer-science/


c. train RF on Tr and create an RF model.

d. create Tst to be the data DTest with just genes belongs to gene_list.

e. res{i} = performance of applying RF model on Tst.

One of the outputs of the CogNet is a table that reports the performance on top 1, 2,…,
n. However, we have created a table only for the top 10. We should state that n is a variable
and is dependent on the data. Sometimes, n may not reach 10 because pathfindR
reports only the significant KEGG pathways. In the worst case, the output of pathfindR
could be an empty list.

Another output is the significant list of KEGG pathways and also the significant genes.
We run the CogNet algorithm multiple times (k times) where each time the data is

split into 90% for training and 10% for testing. The CogNet keeps track of each KEGG
pathway and the number of times appears on top of the list. Additionally, CogNet
keeps track of the genes that belong to the pathway and how many times they were on the
top. An example of input of top pathways and genes produced by CogNet is presented in
Table 4.

Implementation
We have decided to use the free and open-source platform Knime (Berthold et al., 2009)
due to its simplicity and very useful graphical representations. Additionally, Knime is a
highly integrative tool. A script for performing analysis with pathfindR was imported as an
R node to the Knime workflow.

The Knime workflow consists mainly of nodes where each node has its own
functionality. Meta- node is created as a collection of nodes that has a specific task to
perform.

The workflow Multi-File CogNet is presented in Fig. 4. It starts by uploading a list
of the names of the datasets (URL) by the “List Files node”. Then a loop over those datasets
to read each data by the node “Table Reader” and send it as input to the CogNet tool
(CogNet meta-node).

While Fig. 3 presents the flowchart of CogNet algorithm, Fig. 5 presents the
implementation of CogNet as a Knime workflow with its meta-nodes. The input has two
ports, where the first port is for the test data, the second port is for the training data.
The training data is passed to the tool pathfindR (one of the meta-nodes) to process the
data and get as an output a sorted table of significant pathways. The “RankPathWays”
meta nodes perform the task of the ranking, while the flow between “Counting Loop start”
and “Loop End” is for performing testing on top i pathways. It ranges from 1 to 10.

Additional task for the node “Loop End” is to collect all the results and send them out to
be processed and save the results.

Gene expression data
A total of 13 human gene expression datasets were downloaded from the gene expression
omnibus (Clough & Barrett, 2016) at NCBI. For all datasets disease (positive) and control

Yousef et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.336 9/20

http://dx.doi.org/10.7717/peerj-cs.336
https://peerj.com/computer-science/


(negative) data were available (Table 5). Those 13 datasets served to test the CogNet and
for comparison with other two tools maTE and SVM-RCE.

Model performance evaluation
For each established model, we calculated a number of statistical measures like sensitivity,
specificity, and accuracy to evaluate model performance. The following formulations were

Table 3 Example output of Rank step.

ID Acc Sen Spe Rec Prec Fm

hsa05165 1.00 1.00 1.00 1.00 1.00 1.00

hsa04151 0.93 0.90 1.00 0.90 1.00 0.93

hsa03010 0.87 0.80 1.00 0.80 1.00 1.00

hsa05170 0.93 0.90 1.00 0.90 1.00 0.93

hsa05163 0.87 1.00 0.60 1.00 0.87 0.92

hsa04530 1.00 1.00 1.00 1.00 1.00 1.00

hsa04064 0.87 1.00 0.60 1.00 0.87 0.92

hsa05418 0.73 0.70 0.80 0.70 0.92 0.87

hsa05200 1.00 1.00 1.00 1.00 1.00 1.00

hsa04110 1.00 1.00 1.00 1.00 1.00 1.00

Note:
The values are the mean of each performance metric appearing on the column title. Acc, accuracy; Sen, sensitivity; Spe,
specificity; Rec, recall; Prec, precession; Fm, F-measure.

Table 4 Example of output of top pathways and top genes.

KEGG
Pathway

count Genes

hsa03460 77 BRIP, RMI2,UBE2T

hsa05034 70 CREB3L4,GNB4,HIST1H2AK,HIST1H2BC,HIST1H2BJ,HIST1H4H,
HIST2H2AA4,PPP1R1B

hsa05322 53 COL4A3,HIST1H2AK,HIST1H2BC,HIST1H2BJ,HIST1H4H,HIST2H2AA4

hsa04151 50 COL1A1,PRLR

Figure 4 Multi-File CogNet workflow that applied on multiple datasets.
Full-size DOI: 10.7717/peerj-cs.336/fig-4

Yousef et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.336 10/20

http://dx.doi.org/10.7717/peerj-cs.336/fig-4
http://dx.doi.org/10.7717/peerj-cs.336
https://peerj.com/computer-science/


used to calculate the statistics (with TP: true positive, FP: false positive, TN: true negative
and FN referring to false negative classifications):

Sensitivity (SE, Recall) = TP/(TP + FN)
Specificity (SP) = TN/(TN + FP)
Accuracy (ACC) = (TP + TN)/(TP + TN + FP + FN); ACC

Additionally, the Area Under the Receiver Operating Characteristic (ROC) Curve
measure (AUC) (Bradley, 1997) is an estimate of the probability that a classifier will rank a
randomly chosen positive instance higher than a randomly chosen negative instance.

All reported performance measures refer to the average of 10-fold MCCV.
Some of the data sets used by the classifier are imbalanced, which can influence the

classifier to the advantage of the set with more samples. This is known as the problem of
the imbalanced class distribution. We have applied an under-sampling approach that
reduces the number of samples of the majority class to the minority class, thus reducing the
bias in the size distribution of the data subsets. We choose to apply the under-sampling of
ratio 1:2.

Availability and implementation
The Knime workflow, implementing CogNet, is available at https://malikyousef.com ->
Bioinformatics Tools and GitHub at https://github.com/malikyousef/miRcorrNet. The
DOI of the tool is https://doi.org/10.5281/zenodo.4273942.

RESULTS
We have considered 13 gene expression data sets to test CogNet and for comparison with
other similar tools. To our knowledge, no tools similar to CogNet exists. Nonetheless, we
compare CogNet with tools that have similar merit of grouping and rankings, maTE
and SVM-RCE. Although the purpose of the comparison is not to prove a higher
performance, it outperforms maTE and gets similar performance to SVM-RCE with

Figure 5 The main workflow (as meta-nodes) for CogNet.
Full-size DOI: 10.7717/peerj-cs.336/fig-5

Yousef et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.336 11/20

https://malikyousef.com
https://github.com/malikyousef/miRcorrNet
https://doi.org/10.5281/zenodo.4273942
http://dx.doi.org/10.7717/peerj-cs.336/fig-5
http://dx.doi.org/10.7717/peerj-cs.336
https://peerj.com/computer-science/


advantage of using a very smaller number of genes than SVM-RCE. Additionally, we have
run CogNet with two versions, where the CogNet_SVM uses the SVM for the scoring
and classification while the CogNet_RF uses Random Forest (RF) for Scoring and
classification. The results for CogNet-SVM is provided in the Supplemental Data.

Table 5 Description of the 10 data sets used in our study. The data sets are obtained from GEO. Each entry has the GEO code the name of the
data, the number of samples and the classes of the data.

GEO Accession Title Sample
count

Classes

GDS1962 Glioma-derived stem cell factor effect on angiogenesis in
the brain

180
pos = 157
neg = 23

non-tumor = 23 (neg)
astrocytomas = 26 (pos)
glioblastomas = 131 (pos)

GDS2519 Early-stage Parkinson’s disease: whole blood 105
pos = 50
neg = 55

healthy control = 22 (neg)
neurodegenerative disease control = 33 (neg)
Parkinson disease = 50 (pos)

GDS3268 Colon epithelial biopsies of ulcerative colitis patients 202
pos = 73
neg = 129

normal = 73
ulcerative colitis = 129

GDS2547 Metastatic prostate cancer (HG-U95C) 164
pos = 75
neg = 89

normal = 75
tumor = 89

GDS5499 Pulmonary hypertensions: PBMCs 140
pos = 99
neg = 41

control = 41 (neg)
idiopathic pulmonary arterial hypertension = 30 (pos)
scleroderma-associated pulm. arterial hypert. = 42 (pos)
systemic sclerosis (SSc) without pulm. hypert. = 19 (pos)
SSc, interstitial lung disease & pulm. hypert. = 8 (pos)

GDS3646 Celiac disease: primary leukocytes 132
pos = 110
neg = 22

healthy control =22
celiac disease = 110

GDS3874 Diabetic children: peripheral blood mononuclear cells
(U133A)

117
neg = 24
pos = 93

healthy = 24
type 1,2 diabetes =93

GDS3837 Non-small cell lung carcinoma in female nonsmokers 120
pos = 60
neg = 60

Lung Cancer = 60
Control = 60

GDS5037 Severe asthma: bronchial epithelial cell 108
Pos = 88
Neg = 20

mild asthma = 50
control = 20
severe asthma = 38

GDS4516_4718 Colorectal cancer: laser microdissected tumor tissues
Colorectal cancer: homogenized tumor tissues

148
pos = 104
neg = 44

laser microdissected tumor tissues = 104
homogenized tumor tissues = 44

GSE4107
(GDS2609)

Colonic mucosa 22
pos = 12
neg = 10

colonic mucosa of healthy control = 12
colonic mucosa patients = 10

GSE15573
(GDS3794)

Rheumatoid Arthritis (RA) Patients 33
pos = 18
neg = 15

18 Rheumatoid Arthritis (RA)
15 Control

GSE5594
(GDS4824)

Prostate cancer
Analysis of malignant and benign prostate tissues

21
pos = 13
neg = 8

prostate cancer = 13
normal = 8

Yousef et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.336 12/20

https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS1962
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS2519
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS3268
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS2547
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS5499
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS3646
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS3874
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS3837
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS5037
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS4516
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4107
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS2609
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15573
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS3794
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE5594
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS4824
http://dx.doi.org/10.7717/peerj-cs.336
https://peerj.com/computer-science/


Table 6 (A–C) A summary result table presenting the AUC of each tool over 13 data sets. #Clusters/#Groups are related to the number of clusters
for SVM-RCE and groups for maTE and CogNet. AUC is average of for the performance of area under curve while #G is the average number of
genes for each level.

#Clusters/
#Groups

SVM-RCE

GDS1962 GDS2519 GDS2547 GDS3268 GDS3646 GDS3837 GDS3874 GDS4516_4718 GDS5037 GDS5499 GDS2609 GDS3794 GDS4824
AUC AUC AUC AUC AUC AUC AUC AUC AUC AUC AUC AUC AUC

10 1.00 0.50 0.84 0.90 0.90 0.97 0.75 1.00 0.48 0.95 1.00 1.00 1.00

9 1.00 0.49 0.84 0.90 0.82 0.98 0.81 1.00 0.57 0.93 0.99 0.98 1.00

8 1.00 0.49 0.85 0.90 0.87 0.98 0.79 1.00 0.53 0.94 0.96 0.98 1.00

7 1.00 0.49 0.85 0.89 0.86 0.98 0.78 1.00 0.53 0.93 0.97 0.98 1.00

6 1.00 0.48 0.85 0.89 0.87 0.98 0.81 1.00 0.53 0.95 0.96 0.98 1.00

5 1.00 0.47 0.85 0.88 0.87 0.98 0.79 1.00 0.50 0.94 0.96 0.98 1.00

4 1.00 0.44 0.84 0.88 0.87 0.98 0.78 1.00 0.44 0.95 0.96 0.98 1.00

3 1.00 0.46 0.84 0.89 0.87 0.98 0.74 1.00 0.48 0.95 0.93 0.98 1.00

2 1.00 0.44 0.81 0.87 0.84 0.98 0.79 1.00 0.43 0.93 0.91 0.98 1.00

1 1.00 0.45 0.80 0.84 0.76 0.97 0.66 1.00 0.50 0.90 0.90 0.95 1.00

Avg 1.00 0.47 0.84 0.88 0.85 0.98 0.77 1.00 0.50 0.94 0.95 0.98 1.00

#Clusters/
#Groups

#G #G #G #G #G #G #G #G #G #G #G #G #G

10 845 801 896 852 341 904 569 734 505 507 676 442 611

9 827 791 875 818 311 868 586 712 492 493 652 437 536

8 806 752 804 761 272 807 572 675 483 460 638 421 515

7 760 715 747 702 245 772 542 633 436 437 625 413 491

6 720 678 686 644 224 694 499 580 428 394 584 397 468

5 678 638 636 595 209 604 464 555 399 329 558 361 453

4 593 575 544 568 169 486 432 444 348 265 511 340 422

3 429 504 466 470 142 419 375 319 307 225 470 310 381

2 363 444 449 377 74 283 318 188 238 171 390 227 310

1 237 266 373 211 47 137 224 91 131 62 264 120 211

#Clusters/
#Groups

CogNet_RF

GDS1962 GDS2519 GDS2547 GDS3268 GDS3646 GDS3837 GDS3874 GDS4516_4718 GDS5037 GDS5499 GDS2609 GDS3794 GDS4824
AUC AUC AUC AUC AUC AUC AUC AUC AUC AUC AUC AUC AUC

10 1.00 0.65 0.84 0.72 0.76 0.98 0.90 1.00 0.68 0.95 1.00 0.93 0.92

9 0.99 0.65 0.85 0.76 0.76 0.97 0.86 1.00 0.66 0.94 1.00 0.95 1.00

8 0.99 0.63 0.86 0.75 0.76 0.99 0.87 1.00 0.64 0.93 1.00 0.98 1.00

7 0.99 0.68 0.86 0.74 0.76 0.98 0.87 1.00 0.70 0.93 1.00 1.00 1.00

6 0.98 0.55 0.84 0.74 0.68 0.98 0.85 1.00 0.71 0.93 1.00 1.00 1.00

5 0.99 0.74 0.83 0.75 0.68 0.98 0.89 1.00 0.72 0.94 1.00 1.00 0.80

4 0.98 0.68 0.83 0.72 0.66 0.97 0.89 1.00 0.69 0.94 1.00 0.93 0.80

3 0.98 0.66 0.83 0.73 0.68 0.96 0.91 1.00 0.71 0.94 1.00 0.88 0.90

2 0.98 0.65 0.78 0.73 0.64 0.97 0.89 1.00 0.72 0.93 1.00 0.85 0.90

1 0.99 0.59 0.74 0.75 0.72 0.91 0.86 1.00 0.74 0.93 1.00 0.88 0.85

Avg 0.99 0.65 0.83 0.74 0.71 0.97 0.88 1.00 0.70 0.94 1.00 0.94 0.92

#Clusters/
#Groups

#G #G #G #G #G #G #G #G #G #G #G #G #G

10 123 65 62 65 15 34 23 45 16 52 95 27 51

9 110 56 55 59 14 33 22 41 15 50 84 26 45

8 98 53 50 53 13 31 22 39 14 48 77 23 35

(Continued)

Yousef et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.336 13/20

https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS1962
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS2519
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS2547
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS3268
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS3646
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS3837
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS3874
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS4516
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS5037
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS5499
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS2609
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS3794
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS4824
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS1962
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS2519
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS2547
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS3268
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS3646
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS3837
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS3874
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS4516
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS5037
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS5499
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS2609
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS3794
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS4824
http://dx.doi.org/10.7717/peerj-cs.336
https://peerj.com/computer-science/


For each tool other than SVM-RCE, we have obtained the performance over the top
1–10 groups that were ranked by the scoring stage. For SVM-RCE we obtained the
performance starting by 1,000 genes and 100 clusters, then reduced 10% at each iteration.
For comparing purposes, we consider the last 10 clusters. Table 6 presents in detail all
the results obtained applying the CogNet_RF tool. The AUC measure is presented.
The columns #G present the average of the number of genes over the 10 iterations was
applied as a cross-validation.

Figure 6 presents the average of all AUC results over the 13 datasets on the 10 clusters/
groups, for each tool (Avg AUC bar), while the Avg#Genes bar represent the average of the
number of genes over the 13 datasets on the 10 clusters/groups.

The results presented in Tables 6A–6C and Fig. 6, indicate that on average CogNet
outperforms maTE by 3% while getting similar results with SVM-RCE. Considering
the number of genes, SVM-RCE is 37 folds greater than CogNet. Additional observation
that both tools SVM-RCE and maTE failed to reach reasonable results on the data
GDS2519, while CogNet reached performance of about 65–70%.

Figure 6 The average of the results for the three tools. The upper part is the performance AUC
measurement while the lower part is the number of genes (#G).

Full-size DOI: 10.7717/peerj-cs.336/fig-6

Table 6 (continued)

#Clusters/
#Groups

CogNet_RF

GDS1962 GDS2519 GDS2547 GDS3268 GDS3646 GDS3837 GDS3874 GDS4516_4718 GDS5037 GDS5499 GDS2609 GDS3794 GDS4824
AUC AUC AUC AUC AUC AUC AUC AUC AUC AUC AUC AUC AUC

7 88 42 42 47 13 29 19 35 14 46 68 22 26

6 80 34 36 41 11 26 19 30 13 44 62 20 26

5 70 26 30 35 10 24 17 26 9 41 52 18 22

4 62 27 25 28 8 20 16 22 7 37 41 15 17

3 52 22 19 21 7 17 13 17 6 29 32 13 13

2 38 16 13 13 6 13 10 13 3 21 22 9 9

1 8 8 7 6 5 6 6 8 4 8 11 5 5

Yousef et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.336 14/20

http://dx.doi.org/10.7717/peerj-cs.336/fig-6
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS1962
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS2519
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS2547
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS3268
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS3646
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS3837
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS3874
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS4516
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS5037
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS5499
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS2609
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS3794
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS4824
http://dx.doi.org/10.7717/peerj-cs.336
https://peerj.com/computer-science/


Table 7 Association of the top pathways with the disease under study. “Dataset” indicates the dataset GEO ID.

Dataset Investigated
disease

ID Pathway Literature support PMID

GSE15573 Rheumatoid
Arthritis

hsa04932 Non-alcoholic fatty liver
disease (NAFLD)

None None

GSE15574 Rheumatoid
Arthritis

hsa04120 Ubiquitin mediated
proteolysis

Aberration of this system leads to the dysregulation of cellular
homeostasis and the development of multiple inflammatory and
autoimmune diseases, including rheumatoid arthritis.

16978533

GSE15575 Rheumatoid
Arthritis

hsa05016 Huntington disease None None

GSE15576 Rheumatoid
Arthritis

hsa04714 Thermogenesis None None

GSE15577 Rheumatoid
Arthritis

hsa04723 Retrograde
endocannabinoid
signaling

Endocannabinoids, a group of endogenous bioactive lipids, have
immunomodulatory effects able to influence both inflammation
and pain in rheumatic disease, including rheumatoid arthritis.

29164003,
28857069

GSE15578 Rheumatoid
Arthritis

hsa05010 Alzheimer disease None None

GSE15579 Rheumatoid
Arthritis

hsa04140 Autophagy Deregulation of autophagic pathway has recently been implicated in
the pathogenesis of several autoimmune diseases, including
rheumatoid arthritis.

30072986

GSE15580 Rheumatoid
Arthritis

hsa04621 NOD-like receptor
signaling pathway

NOD-like receptors are being implicated in the pathology of RA and
other rheumatic diseases.

19835640

GSE15581 Rheumatoid
Arthritis

hsa05012 Parkinson disease None None

GSE15582 Rheumatoid
Arthritis

hsa05203 Viral carcinogenesis None None

GSE4107 Colorectal
Cancer

hsa04915 Estrogen signaling
pathway

None None

GSE4107 Colorectal
Cancer

hsa04662 B cell receptor signaling
pathway

None None

GSE4107 Colorectal
Cancer

hsa04012 ErbB signaling pathway ERBB pathway may have a role in both normal colon epithelial cell
differentiation and malignant transformation.

27270421

GSE4107 Colorectal
Cancer

hsa05140 Leishmaniasis None None

GSE4107 Colorectal
Cancer

hsa05224 Breast cancer None None

GSE4107 Colorectal
Cancer

hsa04510 Focal adhesion Cancer cells exhibit highly altered focal adhesion dynamics. 28476046

GSE4107 Colorectal
Cancer

hsa04210 Apoptosis Abnormalities in apoptotic function contribute to both the
pathogenesis of colorectal cancer and its resistance to
chemotherapeutic drugs and radiotherapy.

15479695

GSE4107 Colorectal
Cancer

hsa04010 MAPK signaling pathway MAPK signaling plays an important part in progression of colorectal
cancer.

15863380

GSE4107 Colorectal
Cancer

hsa05166 Human T-cell leukemia
virus 1 infection

None None

GSE4107 Colorectal
Cancer

hsa05200 Pathways in cancer "Meta"-pathway of cancer pathways. None

GSE55945 Prostate
Cancer

hsa04110 Cell cycle Dysregulation of the cell cycle is implicated in the biology of many
cancers, including PCa.

7997877,
9096291,
18301781

(Continued)

Yousef et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.336 15/20

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15573
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15574
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15575
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15576
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15577
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15578
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15579
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15580
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15581
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15582
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4107
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4107
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4107
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4107
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4107
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4107
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4107
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4107
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4107
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4107
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE55945
http://dx.doi.org/10.7717/peerj-cs.336
https://peerj.com/computer-science/


Validation of the results
We have conducted further analysis using the datasets (GSE15573, GSE4107 and
GSE55945) that was considered in the pathfindR study. GSE15573 consists of 33 samples
corresponding to 18 Rheumatoid Arthritis (RA) Patients and 15 Controls. The aim of
this study is to identify peripheral blood gene expression profiles for RA patients.
The study considers the standard statistical approaches in order to detect the significant
genes by ANOVA with False Discovery Rate (FDR < 5%). The Gene Ontology (GO) in the
PANTHER database is applied to identify biological processes.

GSE4107 consists of 10 colonic mucosa of healthy control samples and 12 patient
samples. Patients and controls were age- (50 or less), ethnicity- (Chinese) and tissue-
matched. The analysis to detect significant genes was based on T-test, hierarchical
clustering, mean fold-change and principal component.

GSE55945 consists of 13 prostate cancer samples and 8 normal. The aim of the study
is to compare the expression levels between malignant and benign prostate tissues.

We have run the CogNet tool on those three datasets obtaining the performance on top
significant pathways. The AUC values are presented in Table 6.

Association of the top pathways with the disease under study
For each dataset, the top 10 groups (pathways) identified by CogNet were
manually examined in the literature for a possible association with the disease

Table 7 (continued)

Dataset Investigated
disease

ID Pathway Literature support PMID

GSE55945 Prostate
Cancer

hsa04360 Axon guidance None None

GSE55945 Prostate
Cancer

hsa04390 Hippo signaling pathway The hippo pathway effector YAP regulates motility, invasion, and
castration-resistant growth of prostate cancer cells.

25645929

GSE55945 Prostate
Cancer

hsa05166 Human T-cell leukemia
virus 1 infection

None None

GSE55945 Prostate
Cancer

hsa04014 Ras signaling pathway Ras signaling plays an important role in prostate cancer progression
and is a possibly mediator of hormone resistance.

14689577,
20718703

GSE55945 Prostate
Cancer

hsa01040 Biosynthesis of
unsaturated fatty acids

Alterations in lipid metabolism, and specifically the uptake and
synthesis of fatty acids (FAs), comprise a well-documented aspect
of metabolic reprograming in cancer.

31598388

GSE55945 Prostate
Cancer

hsa05163 Human cytomegalovirus
infection

None None

GSE55945 Prostate
Cancer

hsa04350 TGF-beta signaling
pathway

TGF-beta signaling has pivotal roles in tumorigenesis and tumor
progression

26774024,
29115550

GSE55945 Prostate
Cancer

hsa04010 MAPK signaling pathway MAPK signaling pathways act through their effects on apoptosis,
survival, metastatic potential, and androgen-independent growth
in prostate cancer

22046506

GSE55945 Prostate
Cancer

hsa04933 AGE-RAGE signaling
pathway in diabetic
complications

None None

Note:
“Investigated Disease” indicates the disease investigated by the study. “ID” and “Pathway” indicate the KEGG ID and pathway name of the top pathway, respectively.
“Literature support” provides a brief summary of literature support for the pathway-disease association. “PMID” indicates the PubMed ID(s) of the supporting study/
studies.

Yousef et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.336 16/20

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15573
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4107
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE55945
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15573
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4107
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE55945
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE55945
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE55945
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE55945
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE55945
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE55945
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE55945
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE55945
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE55945
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE55945
http://dx.doi.org/10.7717/peerj-cs.336
https://peerj.com/computer-science/


under study. Literature support for the top 10 pathways per each dataset are presented in
Table 7.

For GSE15573, the rheumatoid arthritis dataset, 4 out the top 10 pathways were found
to be supported by literature to be associated with rheumatoid arthritis biology. For
GSE4107, comparing colorectal cancer patients to healthy controls, 5 out of the top
10 pathways were supported by literature to be associated with colorectal cancer.
Finally, for GSE55945, the dataset comparing human prostate benign and malignant
tissue, 6 out of the top 10 pathways were found to be associated with prostate cancer.

The CogNet results highlighted several new pathways that could contribute to the
identification of innovative clinical biomarkers for diagnostic procedures and therapeutic
intervention.

DISCUSSION AND CONCLUSIONS
We have presented a novel tool called CogNet that is based on ranking and classification
and integration of biological knowledge. CogNet is developed on top of the tool pathfindR
to add to its functionality the ability to perform classification using the enriched
KEGG pathways as features. CogNet outputs to the user the performance and a list of
significant KEGG pathways that we believe will contribute to a better and deep
understanding of the data under investigation.

There is similarity between CogNet and maTE in that both rank a group of genes for
classification tasks. However, CogNet is using the biological information of the KEGG
pathways that was processed by an enrichment procedure to suggest a list of significant
pathways, then CogNet ranks those pathways in terms of their contribution for
separating the two-class of the given data (classification task). However, maTE uses prior
information about microRNA and its target genes to group the genes. This grouping is not
related to the expression of the genes as CogNet is considering. Moreover, SVM-RCE is
using the clustering algorithm k-means in order to group the genes into clusters, that
means that groups are related to the expression of the genes.

As a future work, we would develop CogNet to explore the effectiveness of different
combinations of the KEGG pathways in the data, that means instead of ranking each
pathway’s genes individually, we will use a different approach to rank those groups
simultaneously. In the current version, CogNet ranks each KEGG pathway individually by
performing internal cross validation.

Additionally, we are working to integrate CogNet and maTE where the process of
the rank will be applied on the groups generated by the biological functions used in each
tool. As a result, the discovery of the significant pathways and microRNA targets genes will
suggest to the biology researcher to explore the role of pathways and microRNA together
in the same data.

The success of CogNet, maTE and SVM-RCE is suggesting that more computational
approaches need to be developed based on the merit of integration of biological
information into the machine learning algorithm.

Yousef et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.336 17/20

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15573
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4107
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE55945
http://dx.doi.org/10.7717/peerj-cs.336
https://peerj.com/computer-science/


ADDITIONAL INFORMATION AND DECLARATIONS

Funding
The authors received no funding for this work.

Competing Interests
The authors declare that they have no competing interests.

Author Contributions
� Malik Yousef conceived and designed the experiments, performed the experiments,
analyzed the data, performed the computation work, prepared figures and/or tables,
authored or reviewed drafts of the paper, and approved the final draft.

� Ege Ülgen conceived and designed the experiments, performed the experiments,
analyzed the data, performed the computation work, prepared figures and/or tables,
authored or reviewed drafts of the paper, and approved the final draft.

� Osman Uğur Sezerman conceived and designed the experiments, performed the
experiments, analyzed the data, authored or reviewed drafts of the paper, and approved
the final draft.

Data Availability
The following information was supplied regarding data availability:

Data is available at NCBI GEO: GDS1962, GDS2519, GDS3268, GDS2547, GDS5499,
GDS3646, GDS3874, GDS3837, GDS5037, GDS4516, GDS4517, GDS4718, GSE4107,
GDS2609, GSE15573, GDS3794, GSE5594, GDS4824.

The code is available at GitHub:
https://github.com/malikyousef/miRcorrNet.
The DOI of the tool is available at Zenodo: malikyousef. (2020, November 14).

malikyousef/miRcorrNet: miRcorrNet (Version v1.0). Zenodo. DOI 10.5281/zenodo.
4273942.

REFERENCES
Acharya S, Saha S, Nikhil N. 2017. Unsupervised gene selection using biological knowledge:

application in sample clustering. BMC Bioinformatics 18(1):513
DOI 10.1186/s12859-017-1933-0.

Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K,
Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC,
Richardson JE, Ringwald M, Rubin GM, Sherlock G. 2000. Gene ontology: tool for the
unification of biology. Nature Genetics 25(1):25–29 DOI 10.1038/75556.

Bellazzi R, Zupan B. 2007. Towards knowledge-based gene expression data mining. Journal of
Biomedical Informatics 40(6):787–802 DOI 10.1016/j.jbi.2007.06.005.

Berthold MR, Cebron N, Dill F, Gabriel TR, Kötter T, Meinl T, Ohl P, Thiel K, Wiswedel B.
2009. KNIME: the Konstanz information miner: version 2.0 and beyond. ACM SIGKDD
Explorations Newsletter 11(1):26–31 DOI 10.1145/1656274.1656280.

Yousef et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.336 18/20

http://www.ncbi.nlm.nih.gov/nuccore/GDS1962
http://www.ncbi.nlm.nih.gov/nuccore/GDS2519
http://www.ncbi.nlm.nih.gov/nuccore/GDS3268
http://www.ncbi.nlm.nih.gov/nuccore/GDS2547
http://www.ncbi.nlm.nih.gov/nuccore/GDS5499
http://www.ncbi.nlm.nih.gov/nuccore/GDS3646
http://www.ncbi.nlm.nih.gov/nuccore/GDS3874
http://www.ncbi.nlm.nih.gov/nuccore/GDS3837
http://www.ncbi.nlm.nih.gov/nuccore/GDS5037
http://www.ncbi.nlm.nih.gov/nuccore/GDS4516
http://www.ncbi.nlm.nih.gov/nuccore/GDS4517
http://www.ncbi.nlm.nih.gov/nuccore/GDS4718
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4107
http://www.ncbi.nlm.nih.gov/nuccore/GDS2609
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15573
http://www.ncbi.nlm.nih.gov/nuccore/GDS3794
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE5594
http://www.ncbi.nlm.nih.gov/nuccore/GDS4824
https://dx.doi.org/10.5281/zenodo.4273942
https://dx.doi.org/10.5281/zenodo.4273942
http://dx.doi.org/10.1186/s12859-017-1933-0
http://dx.doi.org/10.1038/75556
http://dx.doi.org/10.1016/j.jbi.2007.06.005
http://dx.doi.org/10.1145/1656274.1656280
http://dx.doi.org/10.7717/peerj-cs.336
https://peerj.com/computer-science/


Bradley AP. 1997. The use of the area under the ROC curve in the evaluation of machine learning
algorithms. Pattern Recognition 30(7):1145–1159 DOI 10.1016/S0031-3203(96)00142-2.

Clough E, Barrett T. 2016. The gene expression omnibus database. Methods in Molecular Biology
1418(5235):93–110 DOI 10.1007/978-1-4939-3578-9_5.

Cohn-Alperovich D, Rabner A, Kifer I, Mandel-Gutfreund Y, Yakhini Z. 2016. Mutual
enrichment in aggregated ranked lists with applications to gene expression regulation.
Bioinformatics 32(17):i464–i472 DOI 10.1093/bioinformatics/btw435.

Deshpande G, Li Z, Santhanam P, Coles CD, Lynch ME, Hamann S, Hu X. 2010. Recursive
cluster elimination based support vector machine for disease state prediction using resting state
functional and effective brain connectivity. PLOS ONE 5(12):e14277
DOI 10.1371/journal.pone.0014277.

Fang OH, Mustapha N, Sulaiman MN. 2014. An integrative gene selection with association
analysis for microarray data classification. Intelligent Data Analysis 18(4):739–758
DOI 10.3233/IDA-140666.

Harris D, Niekerk AV. 2018. Feature clustering and ranking for selecting stable features from high
dimensional remotely sensed data. International Journal of Remote Sensing 39(23):8934–8949
DOI 10.1080/01431161.2018.1500730.

Inza I, Larrañaga P, Blanco R, Cerrolaza AJ. 2004. Filter versus wrapper gene selection
approaches in DNA microarray domains. Artificial Intelligence in Medicine 31(2):91–103
DOI 10.1016/j.artmed.2004.01.007.

Johannes M, Brase JC, Fröhlich H, Gade S, Gehrmann M, Fälth M, Sültmann H, Beissbarth T.
2010. Integration of pathway knowledge into a reweighted recursive feature elimination
approach for risk stratification of cancer patients. Bioinformatics 26(17):2136–2144
DOI 10.1093/bioinformatics/btq345.

Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, De Schaetzen V, Duque R,
Bersini H, Nowé A. 2012. A survey on filter techniques for feature selection in gene expression
microarray analysis. IEEE/ACM Transactions on Computational Biology and Bioinformatics
9(4):1106–1119 DOI 10.1109/TCBB.2012.33.

Lazzarini N, Bacardit J. 2017. RGIFE: a ranked guided iterative feature elimination heuristic for
the identification of biomarkers. BMC Bioinformatics 18(1):89 DOI 10.1186/s12859-017-1729-2.

Nacu S, Critchley-Thorne R, Lee P, Holmes S. 2007. Gene expression network analysis and
applications to immunology. Bioinformatics 23(7):850–858
DOI 10.1093/bioinformatics/btm019.

Pan W. 2002. A comparative review of statistical methods for discovering differentially expressed
genes in replicated microarray experiments. Bioinformatics 18(4):546–554
DOI 10.1093/bioinformatics/18.4.546.

Papachristoudis G, Diplaris S, Mitkas PA. 2010. SoFoCles: feature filtering for microarray
classification based on gene ontology. Journal of Biomedical Informatics 43(1):1–14
DOI 10.1016/j.jbi.2009.06.002.

Perscheid C, Grasnick B, Uflacker M. 2019. Integrative gene selection on gene expression data:
providing biological context to traditional approaches. Journal of Integrative Bioinformatics
16(1):27 DOI 10.1515/jib-2018-0064.

Piñero J, Ramírez-Anguita JM, Saüch-Pitarch J, Ronzano F, Centeno E, Sanz F, Furlong LI.
2019. The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids
Research 18:gkz1021 DOI 10.1093/nar/gkz1021.

Yousef et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.336 19/20

http://dx.doi.org/10.1016/S0031-3203(96)00142-2
http://dx.doi.org/10.1007/978-1-4939-3578-9_5
http://dx.doi.org/10.1093/bioinformatics/btw435
http://dx.doi.org/10.1371/journal.pone.0014277
http://dx.doi.org/10.3233/IDA-140666
http://dx.doi.org/10.1080/01431161.2018.1500730
http://dx.doi.org/10.1016/j.artmed.2004.01.007
http://dx.doi.org/10.1093/bioinformatics/btq345
http://dx.doi.org/10.1109/TCBB.2012.33
http://dx.doi.org/10.1186/s12859-017-1729-2
http://dx.doi.org/10.1093/bioinformatics/btm019
http://dx.doi.org/10.1093/bioinformatics/18.4.546
http://dx.doi.org/10.1016/j.jbi.2009.06.002
http://dx.doi.org/10.1515/jib-2018-0064
http://dx.doi.org/10.1093/nar/gkz1021
http://dx.doi.org/10.7717/peerj-cs.336
https://peerj.com/computer-science/


Qi J, Tang J. 2007. Integrating gene ontology into discriminative powers of genes for feature
selection in microarray data. In: Proceedings of the 2007 ACM Symposium on Applied
Computing—SAC ’07, Seoul, Korea, 430.

Quanz B, Park M, Huan J. 2008. Biological pathways as features for microarray data classification.
In: Proceeding of the 2nd International Workshop on Data and Text Mining in Bioinformatics—
DTMBIO ’08. Napa Valley, California, USA, 5.

Raghu VK, Ge X, Chrysanthis PK, Benos PV. 2017. Integrated theory-and data-driven feature
selection in gene expression data analysis. In: 2017 IEEE 33rd International Conference on Data
Engineering (ICDE). San Diego, CA, USA, 1525–1532.

Ulgen E, Ozisik O, Sezerman OU. 2019. PathfindR: an R package for comprehensive identification
of enriched pathways in omics data through active subnetworks. Frontiers in Genetics 10:490
DOI 10.3389/fgene.2019.00858.

Van ’t Veer LJ, Dai H, Van de Vijver MJ, He YD, Hart AAM, Mao M, Peterse HL, Van der Kooy
K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards
R, Friend SH. 2002. Gene expression profiling predicts clinical outcome of breast cancer.
Nature 415(6871):530–536 DOI 10.1038/415530a.

Vanjimalar S, Ramyachitra D, Manikandan P. 2018. A review on feature selection techniques for
gene expression data. In: 2018 IEEE International Conference on Computational Intelligence and
Computing Research (ICCIC). 1–4.

Yousef M, Abdallah L, Allmer J. 2019. maTE: discovering expressed interactions between
microRNAs and their targets. Bioinformatics 35(20):4020–4028
DOI 10.1093/bioinformatics/btz204.

Yousef M, Jung S, Showe L, Showe M. 2007. Recursive cluster elimination (RCE) for classification
and feature selection from gene expression data. BMC Bioinformatics 8(1):144
DOI 10.1186/1471-2105-8-144.

Yousef M, Ketany M, Manevitz L, Showe LC, Showe MK. 2009. Classification and biomarker
identification using gene network modules and support vector machines. BMC Bioinformatics
10(1):337 DOI 10.1186/1471-2105-10-337.

Zhao X, Wang L, Chen G. 2017. Joint covariate detection on expression profiles for identifying
micrornas related to venous metastasis in Hepatocellular Carcinoma. Scientific Reports 7(1):5349
DOI 10.1038/s41598-017-05776-1.

Zycinski G, Barla A, Squillario M, Sanavia T, Camillo BDi, Verri A. 2013. Knowledge driven
variable selection (KDVS)—a new approach to enrichment analysis of gene signatures obtained
from high–throughput data. Source Code for Biology and Medicine 8(1):33
DOI 10.1186/1751-0473-8-2.

Yousef et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.336 20/20

http://dx.doi.org/10.3389/fgene.2019.00858
http://dx.doi.org/10.1038/415530a
http://dx.doi.org/10.1093/bioinformatics/btz204
http://dx.doi.org/10.1186/1471-2105-8-144
http://dx.doi.org/10.1186/1471-2105-10-337
http://dx.doi.org/10.1038/s41598-017-05776-1
http://dx.doi.org/10.1186/1751-0473-8-2
http://dx.doi.org/10.7717/peerj-cs.336
https://peerj.com/computer-science/

	CogNet: classification of gene expression data based on ranked active-subnetwork-oriented KEGG pathway enrichment analysis
	Introduction
	Materials and Methods
	Results
	Discussion and conclusions
	References


<<
  /ASCII85EncodePages false
  /AllowTransparency false
  /AutoPositionEPSFiles true
  /AutoRotatePages /None
  /Binding /Left
  /CalGrayProfile (Dot Gain 20%)
  /CalRGBProfile (sRGB IEC61966-2.1)
  /CalCMYKProfile (U.S. Web Coated \050SWOP\051 v2)
  /sRGBProfile (sRGB IEC61966-2.1)
  /CannotEmbedFontPolicy /Warning
  /CompatibilityLevel 1.4
  /CompressObjects /Off
  /CompressPages true
  /ConvertImagesToIndexed true
  /PassThroughJPEGImages true
  /CreateJobTicket false
  /DefaultRenderingIntent /Default
  /DetectBlends true
  /DetectCurves 0.0000
  /ColorConversionStrategy /LeaveColorUnchanged
  /DoThumbnails false
  /EmbedAllFonts true
  /EmbedOpenType false
  /ParseICCProfilesInComments true
  /EmbedJobOptions true
  /DSCReportingLevel 0
  /EmitDSCWarnings false
  /EndPage -1
  /ImageMemory 1048576
  /LockDistillerParams false
  /MaxSubsetPct 100
  /Optimize true
  /OPM 1
  /ParseDSCComments true
  /ParseDSCCommentsForDocInfo true
  /PreserveCopyPage true
  /PreserveDICMYKValues true
  /PreserveEPSInfo true
  /PreserveFlatness true
  /PreserveHalftoneInfo false
  /PreserveOPIComments false
  /PreserveOverprintSettings true
  /StartPage 1
  /SubsetFonts true
  /TransferFunctionInfo /Apply
  /UCRandBGInfo /Preserve
  /UsePrologue false
  /ColorSettingsFile (None)
  /AlwaysEmbed [ true
  ]
  /NeverEmbed [ true
  ]
  /AntiAliasColorImages false
  /CropColorImages true
  /ColorImageMinResolution 300
  /ColorImageMinResolutionPolicy /OK
  /DownsampleColorImages false
  /ColorImageDownsampleType /Average
  /ColorImageResolution 300
  /ColorImageDepth 8
  /ColorImageMinDownsampleDepth 1
  /ColorImageDownsampleThreshold 1.50000
  /EncodeColorImages true
  /ColorImageFilter /FlateEncode
  /AutoFilterColorImages false
  /ColorImageAutoFilterStrategy /JPEG
  /ColorACSImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  >>
  /ColorImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  >>
  /JPEG2000ColorACSImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  >>
  /JPEG2000ColorImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  >>
  /AntiAliasGrayImages false
  /CropGrayImages true
  /GrayImageMinResolution 300
  /GrayImageMinResolutionPolicy /OK
  /DownsampleGrayImages false
  /GrayImageDownsampleType /Average
  /GrayImageResolution 300
  /GrayImageDepth 8
  /GrayImageMinDownsampleDepth 2
  /GrayImageDownsampleThreshold 1.50000
  /EncodeGrayImages true
  /GrayImageFilter /FlateEncode
  /AutoFilterGrayImages false
  /GrayImageAutoFilterStrategy /JPEG
  /GrayACSImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  >>
  /GrayImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  >>
  /JPEG2000GrayACSImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  >>
  /JPEG2000GrayImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  >>
  /AntiAliasMonoImages false
  /CropMonoImages true
  /MonoImageMinResolution 1200
  /MonoImageMinResolutionPolicy /OK
  /DownsampleMonoImages false
  /MonoImageDownsampleType /Average
  /MonoImageResolution 1200
  /MonoImageDepth -1
  /MonoImageDownsampleThreshold 1.50000
  /EncodeMonoImages true
  /MonoImageFilter /CCITTFaxEncode
  /MonoImageDict <<
    /K -1
  >>
  /AllowPSXObjects false
  /CheckCompliance [
    /None
  ]
  /PDFX1aCheck false
  /PDFX3Check false
  /PDFXCompliantPDFOnly false
  /PDFXNoTrimBoxError true
  /PDFXTrimBoxToMediaBoxOffset [
    0.00000
    0.00000
    0.00000
    0.00000
  ]
  /PDFXSetBleedBoxToMediaBox true
  /PDFXBleedBoxToTrimBoxOffset [
    0.00000
    0.00000
    0.00000
    0.00000
  ]
  /PDFXOutputIntentProfile (None)
  /PDFXOutputConditionIdentifier ()
  /PDFXOutputCondition ()
  /PDFXRegistryName ()
  /PDFXTrapped /False

  /CreateJDFFile false
  /Description <<
    /CHS <FEFF4f7f75288fd94e9b8bbe5b9a521b5efa7684002000500044004600206587686353ef901a8fc7684c976262535370673a548c002000700072006f006f00660065007200208fdb884c9ad88d2891cf62535370300260a853ef4ee54f7f75280020004100630072006f0062006100740020548c002000410064006f00620065002000520065006100640065007200200035002e003000204ee553ca66f49ad87248672c676562535f00521b5efa768400200050004400460020658768633002>
    /CHT <FEFF4f7f752890194e9b8a2d7f6e5efa7acb7684002000410064006f006200650020005000440046002065874ef653ef5728684c9762537088686a5f548c002000700072006f006f00660065007200204e0a73725f979ad854c18cea7684521753706548679c300260a853ef4ee54f7f75280020004100630072006f0062006100740020548c002000410064006f00620065002000520065006100640065007200200035002e003000204ee553ca66f49ad87248672c4f86958b555f5df25efa7acb76840020005000440046002065874ef63002>
    /DAN <FEFF004200720075006700200069006e0064007300740069006c006c0069006e006700650072006e0065002000740069006c0020006100740020006f007000720065007400740065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e007400650072002000740069006c0020006b00760061006c00690074006500740073007500640073006b007200690076006e0069006e006700200065006c006c006500720020006b006f007200720065006b007400750072006c00e60073006e0069006e0067002e0020004400650020006f007000720065007400740065006400650020005000440046002d0064006f006b0075006d0065006e0074006500720020006b0061006e002000e50062006e00650073002000690020004100630072006f00620061007400200065006c006c006500720020004100630072006f006200610074002000520065006100640065007200200035002e00300020006f00670020006e0079006500720065002e>
    /DEU <FEFF00560065007200770065006e00640065006e0020005300690065002000640069006500730065002000450069006e007300740065006c006c0075006e00670065006e0020007a0075006d002000450072007300740065006c006c0065006e00200076006f006e002000410064006f006200650020005000440046002d0044006f006b0075006d0065006e00740065006e002c00200076006f006e002000640065006e0065006e002000530069006500200068006f00630068007700650072007400690067006500200044007200750063006b006500200061007500660020004400650073006b0074006f0070002d0044007200750063006b00650072006e00200075006e0064002000500072006f006f0066002d00470065007200e400740065006e002000650072007a0065007500670065006e0020006d00f60063006800740065006e002e002000450072007300740065006c006c007400650020005000440046002d0044006f006b0075006d0065006e007400650020006b00f6006e006e0065006e0020006d006900740020004100630072006f00620061007400200075006e0064002000410064006f00620065002000520065006100640065007200200035002e00300020006f0064006500720020006800f600680065007200200067006500f600660066006e00650074002000770065007200640065006e002e>
    /ESP <FEFF005500740069006c0069006300650020006500730074006100200063006f006e0066006900670075007200610063006900f3006e0020007000610072006100200063007200650061007200200064006f00630075006d0065006e0074006f0073002000640065002000410064006f0062006500200050004400460020007000610072006100200063006f006e00730065006700750069007200200069006d0070007200650073006900f3006e002000640065002000630061006c006900640061006400200065006e00200069006d0070007200650073006f0072006100730020006400650020006500730063007200690074006f00720069006f00200079002000680065007200720061006d00690065006e00740061007300200064006500200063006f00720072006500630063006900f3006e002e002000530065002000700075006500640065006e00200061006200720069007200200064006f00630075006d0065006e0074006f00730020005000440046002000630072006500610064006f007300200063006f006e0020004100630072006f006200610074002c002000410064006f00620065002000520065006100640065007200200035002e003000200079002000760065007200730069006f006e0065007300200070006f00730074006500720069006f007200650073002e>
    /FRA <FEFF005500740069006c006900730065007a00200063006500730020006f007000740069006f006e00730020006100660069006e00200064006500200063007200e900650072002000640065007300200064006f00630075006d0065006e00740073002000410064006f00620065002000500044004600200070006f007500720020006400650073002000e90070007200650075007600650073002000650074002000640065007300200069006d007000720065007300730069006f006e00730020006400650020006800610075007400650020007100750061006c0069007400e90020007300750072002000640065007300200069006d007000720069006d0061006e0074006500730020006400650020006200750072006500610075002e0020004c0065007300200064006f00630075006d0065006e00740073002000500044004600200063007200e900e90073002000700065007500760065006e0074002000ea0074007200650020006f007500760065007200740073002000640061006e00730020004100630072006f006200610074002c002000610069006e00730069002000710075002700410064006f00620065002000520065006100640065007200200035002e0030002000650074002000760065007200730069006f006e007300200075006c007400e90072006900650075007200650073002e>
    /ITA <FEFF005500740069006c0069007a007a006100720065002000710075006500730074006500200069006d0070006f007300740061007a0069006f006e00690020007000650072002000630072006500610072006500200064006f00630075006d0065006e00740069002000410064006f006200650020005000440046002000700065007200200075006e00610020007300740061006d007000610020006400690020007100750061006c0069007400e00020007300750020007300740061006d00700061006e0074006900200065002000700072006f006f0066006500720020006400650073006b0074006f0070002e0020004900200064006f00630075006d0065006e007400690020005000440046002000630072006500610074006900200070006f00730073006f006e006f0020006500730073006500720065002000610070006500720074006900200063006f006e0020004100630072006f00620061007400200065002000410064006f00620065002000520065006100640065007200200035002e003000200065002000760065007200730069006f006e006900200073007500630063006500730073006900760065002e>
    /JPN <FEFF9ad854c18cea51fa529b7528002000410064006f0062006500200050004400460020658766f8306e4f5c6210306b4f7f75283057307e30593002537052376642306e753b8cea3092670059279650306b4fdd306430533068304c3067304d307e3059300230c730b930af30c830c330d730d730ea30f330bf3067306e53705237307e305f306f30d730eb30fc30d57528306b9069305730663044307e305930023053306e8a2d5b9a30674f5c62103055308c305f0020005000440046002030d530a130a430eb306f3001004100630072006f0062006100740020304a30883073002000410064006f00620065002000520065006100640065007200200035002e003000204ee5964d3067958b304f30533068304c3067304d307e30593002>
    /KOR <FEFFc7740020c124c815c7440020c0acc6a9d558c5ec0020b370c2a4d06cd0d10020d504b9b0d1300020bc0f0020ad50c815ae30c5d0c11c0020ace0d488c9c8b85c0020c778c1c4d560002000410064006f0062006500200050004400460020bb38c11cb97c0020c791c131d569b2c8b2e4002e0020c774b807ac8c0020c791c131b41c00200050004400460020bb38c11cb2940020004100630072006f0062006100740020bc0f002000410064006f00620065002000520065006100640065007200200035002e00300020c774c0c1c5d0c11c0020c5f40020c2180020c788c2b5b2c8b2e4002e>
    /NLD (Gebruik deze instellingen om Adobe PDF-documenten te maken voor kwaliteitsafdrukken op desktopprinters en proofers. De gemaakte PDF-documenten kunnen worden geopend met Acrobat en Adobe Reader 5.0 en hoger.)
    /NOR <FEFF004200720075006b00200064006900730073006500200069006e006e007300740069006c006c0069006e00670065006e0065002000740069006c002000e50020006f0070007000720065007400740065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e00740065007200200066006f00720020007500740073006b00720069006600740020006100760020006800f800790020006b00760061006c00690074006500740020007000e500200062006f007200640073006b0072006900760065007200200065006c006c00650072002000700072006f006f006600650072002e0020005000440046002d0064006f006b0075006d0065006e00740065006e00650020006b0061006e002000e50070006e00650073002000690020004100630072006f00620061007400200065006c006c00650072002000410064006f00620065002000520065006100640065007200200035002e003000200065006c006c00650072002000730065006e006500720065002e>
    /PTB <FEFF005500740069006c0069007a006500200065007300730061007300200063006f006e00660069006700750072006100e700f50065007300200064006500200066006f0072006d00610020006100200063007200690061007200200064006f00630075006d0065006e0074006f0073002000410064006f0062006500200050004400460020007000610072006100200069006d0070007200650073007300f5006500730020006400650020007100750061006c0069006400610064006500200065006d00200069006d00700072006500730073006f0072006100730020006400650073006b0074006f00700020006500200064006900730070006f00730069007400690076006f0073002000640065002000700072006f00760061002e0020004f007300200064006f00630075006d0065006e0074006f00730020005000440046002000630072006900610064006f007300200070006f00640065006d0020007300650072002000610062006500720074006f007300200063006f006d0020006f0020004100630072006f006200610074002000650020006f002000410064006f00620065002000520065006100640065007200200035002e0030002000650020007600650072007300f50065007300200070006f00730074006500720069006f007200650073002e>
    /SUO <FEFF004b00e40079007400e40020006e00e40069007400e4002000610073006500740075006b007300690061002c0020006b0075006e0020006c0075006f0074002000410064006f0062006500200050004400460020002d0064006f006b0075006d0065006e007400740065006a00610020006c0061006100640075006b006100730074006100200074007900f6007000f60079007400e400740075006c006f0073007400750073007400610020006a00610020007600650064006f007300740075007300740061002000760061007200740065006e002e00200020004c0075006f0064007500740020005000440046002d0064006f006b0075006d0065006e00740069007400200076006f0069006400610061006e0020006100760061007400610020004100630072006f0062006100740069006c006c00610020006a0061002000410064006f00620065002000520065006100640065007200200035002e0030003a006c006c00610020006a006100200075007500640065006d006d0069006c006c0061002e>
    /SVE <FEFF0041006e007600e4006e00640020006400650020006800e4007200200069006e0073007400e4006c006c006e0069006e006700610072006e00610020006f006d002000640075002000760069006c006c00200073006b006100700061002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e00740020006600f600720020006b00760061006c00690074006500740073007500740073006b0072006900660074006500720020007000e5002000760061006e006c00690067006100200073006b0072006900760061007200650020006f006300680020006600f600720020006b006f007200720065006b007400750072002e002000200053006b006100700061006400650020005000440046002d0064006f006b0075006d0065006e00740020006b0061006e002000f600700070006e00610073002000690020004100630072006f0062006100740020006f00630068002000410064006f00620065002000520065006100640065007200200035002e00300020006f00630068002000730065006e006100720065002e>
    /ENU (Use these settings to create Adobe PDF documents for quality printing on desktop printers and proofers.  Created PDF documents can be opened with Acrobat and Adobe Reader 5.0 and later.)
  >>
  /Namespace [
    (Adobe)
    (Common)
    (1.0)
  ]
  /OtherNamespaces [
    <<
      /AsReaderSpreads false
      /CropImagesToFrames true
      /ErrorControl /WarnAndContinue
      /FlattenerIgnoreSpreadOverrides false
      /IncludeGuidesGrids false
      /IncludeNonPrinting false
      /IncludeSlug false
      /Namespace [
        (Adobe)
        (InDesign)
        (4.0)
      ]
      /OmitPlacedBitmaps false
      /OmitPlacedEPS false
      /OmitPlacedPDF false
      /SimulateOverprint /Legacy
    >>
    <<
      /AddBleedMarks false
      /AddColorBars false
      /AddCropMarks false
      /AddPageInfo false
      /AddRegMarks false
      /ConvertColors /NoConversion
      /DestinationProfileName ()
      /DestinationProfileSelector /NA
      /Downsample16BitImages true
      /FlattenerPreset <<
        /PresetSelector /MediumResolution
      >>
      /FormElements false
      /GenerateStructure true
      /IncludeBookmarks false
      /IncludeHyperlinks false
      /IncludeInteractive false
      /IncludeLayers false
      /IncludeProfiles true
      /MultimediaHandling /UseObjectSettings
      /Namespace [
        (Adobe)
        (CreativeSuite)
        (2.0)
      ]
      /PDFXOutputIntentProfileSelector /NA
      /PreserveEditing true
      /UntaggedCMYKHandling /LeaveUntagged
      /UntaggedRGBHandling /LeaveUntagged
      /UseDocumentBleed false
    >>
  ]
>> setdistillerparams
<<
  /HWResolution [2400 2400]
  /PageSize [612.000 792.000]
>> setpagedevice