Expert Systems With Applications 105 (2018) 23–33 

Contents lists available at ScienceDirect 

Expert Systems With Applications 

journal homepage: www.elsevier.com/locate/eswa 

Robust Semi-Supervised Growing Self-Organizing Map 

Ali Mehrizi, Hadi Sadoghi Yazdi ∗, Amir Hossein Taherinia 
Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran 

a r t i c l e i n f o 

Article history: 

Received 14 October 2017 

Revised 22 March 2018 

Accepted 23 March 2018 

Available online 26 March 2018 

Keywords: 

Semi-supervised learning 

Online learning 

Dynamic self-organization network 

Adaptive learning 

Half quadratic 

a b s t r a c t 

Semi-Supervised Growing Self Organizing Map (SSGSOM) is one of the best methods for online classifica- 

tion with partial labeled data. Many parameters can affect the performance of this method. The structure 

of GSOM network, activation degree and learning approach are the most important factors in SSGSOM. In 

this paper, a comprehensive robust mathematical formulation of the problem is proposed and then half 

quadratic (HQ) is used to solve it. Furthermore, an adaptive method is proposed to adjust activation de- 

gree optimally to improve the performance of SSGSOM. The results are reported on a variety of synthetic 

and UCI datasets and in the noisy conditions, which show superiority and robustness of the proposed 

method compared with the state of the art approaches. 

© 2018 Elsevier Ltd. All rights reserved. 

1

 
p  

b  

a  

u  

f  

l  

t  

t  

l  

(  

p  

Y  

&

 
d  

b  

o  

i  

l  

t  

c  

t  

l  

Y

i  

b  

o  

t  

i  

w  

n  

n  

t  

p  

m  

fl  

a

 
o  

t  

s  

o  

d  

w  

f  

t

 
b  

(  

t  

h

0

. Introduction 

Extracting knowledge from unlabeled data, also known as unsu-

ervised learning, improves the performance of systems. However,

y increasing the volume of imported data, they require a large

mount of memory and computation time and therefore become

nusable or very inefficient. If a small portion of data has side in-

ormation such as labels, it can increase the accuracy and speed of

earning. The use of labeled data for learning together with clus-

ering methods is called semi-supervised learning. In other words,

he learning algorithms that combine unsupervised and supervised

earning algorithms, are called semi-supervised learning algorithms

 Zhu & Goldberg, 2009 ). Semi-supervised learning has been ap-

lied for many problems such as those reported in ( Cong, Liu,

uan, & Luo, 2013; Lu, Wang, Xue, & Pan, 2014; Chen, Li, Su, Cao,

 Ji, 2014 ). 

Considering the data importing method, learning algorithms are

ivided into two categories: online learning and batch learning. In

atch learning, all the learning data are available at the beginning

f the process. In contrast, in online learning, none of the data ex-

sts at the beginning and data are logged gradually. Although, on-

ine learning is adapted to the varying environment better, it lacks

he accuracy of batch learning. One of the major reasons for de-

reasing accuracy of the online algorithms is their dependency on

he sequence of data entry. In recent years, attention to online

earning is growing, because by increasing volume of data, learn-
∗ Corresponding author. 
E-mail addresses: mehrizi@um.ac.ir (A. Mehrizi), h-sadoghi@um.ac.ir (H. Sadoghi 

azdi), taherinia@um.ac.ir (A.H. Taherinia). 

s  

a  

D  

i  

ttps://doi.org/10.1016/j.eswa.2018.03.046 

957-4174/© 2018 Elsevier Ltd. All rights reserved. 
ng methods must have online capabilities and rapid responses

ecomes inevitable. Accordingly, many researchers have focused

n eliminating defects of online learning methods to achieve bet-

er accuracy in a reasonable time. Some promising online cluster-

ng methods are online k-mean ( Alpaydin, 2004 ), neural gas net-

ork ( Martinetz, Berkovich, & Schulten, 1993 ), and self-organizing

etwork ( Kohonen, 1990 ). Among these methods, self-organization

etworks and neural gas networks have been more useful because

hey are inherently online and have properties such as topology

reservation ( Alahakoon, Halgamuge, & Srinivasan, 20 0 0 ). Since the

ajority of studies on semi-supervised learning have been on of-

ine method, in the following we first review the learning methods

nd then discuss studies on online semi-supervised learning. 

Farid et al. (2013) presented a semi-supervised algorithm based

n the ensemble learning method, which is an offline approach

hat determines the label of the data via voting among several clas-

ifications. They used decision trees for data classification. Then, an

nline algorithm was proposed by modifying the structure of the

ecision tree. For this purpose, in the learning phase, a random

eight is allocated to every datum and the decision tree is built

or them. Afterward, the allocated weights are modified based on

he training data. 

Leite and colleagues reported that features and main distri-

ution of data change by time in non-stationary environments

 Leite, Costa, & Gomide, 2010 ). In machine learning terminology,

his change is called as a “concept drift”. For example, in clas-

ification, a concept can be either a class of data or the bound-

ry of data and the drift is data boundary that changes over time.

rifts can be gradual or sudden, contracting or expanding, and def-

nite/random or periodic change of concept. The main requirement


24 A. Mehrizi et al. / Expert Systems With Applications 105 (2018) 23–33 

 
t  

a

 
t  

m  

l  

m  

(

 
s  

g  

t  

u  

t  

K  

t  

l

 
a  

o  

c

 
s  

2  

m

 
d  

c  

A  

fi  

a  

(  

p  

G  

A  

c  

2

 
i  

t  

c  

&  

t  

t  

f  

fi  

t

 
t  

o

 
S  
of online learning is to identify and deal with the drift. They pro-

posed a semi-supervised method based on a granular neural net-

work to deal with data streams. This model has five layers; the first

and fifth layers of the neural network are the input and output

layer, respectively. The second layer performs the data-clustering

task. The third layer performs the merge operation and the fourth

layer makes the decisions about data class. 

Macario and de Carvalho (2012) proposed a semi-supervised

FCM algorithm that adds a supervised ability to the original FCM

algorithm. The main idea of this paper is that data of the same

class have similar degrees of membership. Therefore, if data be-

longs to a certain class, differences between their cluster member-

ship degrees are low and the data have stable degrees of mem-

bership in all clusters. They also proposed an adaptive method to

prevent uncontrolled growth of a cluster. 

For the first time, an online constrained clustering algorithm

was presented by Halkidi, Spiliopoulou, and Pavlou (2012) in order

to find clusters that are centralized and have constraints entering

as a stream. The Halkidi algorithm was based on the MPCK 1 -Means

clustering algorithm. The MPCK-Means algorithm was an offline

algorithm and cannot be used for online data. Therefore, Halkidi

considered data as a chunk that enters the algorithm over time.

The MPCK-Means clustering algorithm performs offline on small

chunk of data. This method can be used to reduce the computa-

tional memory. 

Cong et al. (2013) introduced a self-supervised online met-

ric learning with the low-rank constraint for scene categorization.

Their work used a two-sided graph to evaluate the similarity be-

tween data and metric learning. The labeled and unlabeled data

were used to update metric learning by a high score. The proposed

algorithm of Kung was an online algorithm. 

Li and colleagues introduced a semi-supervised learning algo-

rithm based on the extreme learning combined with cooperative

learning ( Li, Zhang, Xu, Luo, & Li, 2013 ). The main idea of cooper-

ative learning is using data from a learner’s testing phase as train-

ing data for other learners. Looping in the learning process is the

disadvantage of cooperative learning, which leads to a decrease in

learning speed and propagation of the learning process errors to

others. In their paper, using extreme learning was proposed to deal

with these problems because it is very fast and can cover the prob-

lem of cooperative learning ( Huang, Zhu, & Siew, 2004 ) ( Huang, Li,

Chen, & Siew, 2008 ). 

Zhang and colleagues proposed a semi-supervised algorithm

based on co-training ( Zhang, Wen, Wang, & Jiang, 2014 ). This al-

gorithm applies co-training to select the most reliable instances

according to two criteria of high confidence and nearest neighbor

for boosting the classifier and also exploits the most informative

instances with human annotation for improving the classification

performance ( Zhang et al., 2014 ). 

Dornaika and El Traboulsi (2016) proposed a Graph-Based Semi-

Supervised method and its kernelized version for generic classifi-

cation. Although this method is flexible, it is not online. 

Beyer and Cimiano (2012) proposed a semi-supervised method

based on neural gas. This paper extends the offline Growing Neu-

ral Gas classifier to an online method that predicts labels for un-

labeled example and incorporates these labeled examples into the

model on-the-fly ( Beyer & Cimiano, 2012 ). 

Maximo, Quiles, & Nascimento (2014) proposed a new semi-

supervised growing neural gas (GNG) model, wherein instead of

assigning a single scalar label value to each neuron, a vector con-

taining the representativeness level of every class is associated

with each neuron. Also, to propagate the labels among the neurons
1 Metric pairwise constrained K-means. 

e  

S  

d  

f

he Consensus-Based Semi-Supervised GNG employs a consensus

pproach ( Maximo et al., 2014 ). 

Fritzke (1994) proposed growing self-organizing map (GSOM)

hat is able to find a suitable network structure and size of the

odel automatically. Next, this researcher proposed a supervised

earning method that results from the combination of the above-

entioned self-organizing network with the radial basis function

RBF) approach. 

Hsu and Halgamuge (2008) improved Fritzke method and pre-

ented a semi supervised algorithm based on GSOM. In their al-

orithm, a two-layer model for online data classification with par-

ial labels was provided. In the first layer of this model, GSOM is

sed for data clustering and when the labeled data were logged,

he label of this data is used for classification. In this method, like

-mean and fuzzy C-means, there are repetitive operations with

he difference, where in each step only one datum is randomly se-

ected and entered into the GSOM. 

Shen, Yu, Kamiya, and Hasegawa (2010) and colleagues offered

 three-layer architecture that represents the topological structure

f training data (with SOM), learned node labeling, and classifying

onstruction. 

In our previous work, we proposed an online constraint semi-

upervised learning based on GSOM ( Allahyar, Yazdi, & Harati,

015 ), based on a two-layer model that came from the develop-

ent of the HSU model ( Hsu & Halgamuge, 2008 ). 

Self-Organizing Maps (SOM) is a suitable learning method for

ata clustering ( Kohonen, 1990 ). This method is based on artifi-

ial neural networks (ANNs) and its structure is formed online.

ccordingly, the network is suitable for online learning. SOM was

rst proposed in 1990 by Kohonen (1990) . Dimensions of SOM are

 parameter that strongly influences the final result of clustering

 Alahakoon et al., 20 0 0 ). In the Kohonen’s model, the user sets this

arameter manually. To resolve this problem, ALkahon presented

rowing Self-organizing Map (GSOM) ( Alahakoon et al., 20 0 0 ). In

Lkahon models, expansion dimensions are performed automati-

ally depending on type and structure of data ( Alahakoon et al.,

0 0 0 ). 

Hsu and Halgamuge (2008) presented Semi-supervised Grow-

ng Self-Organizing Map (SSGSOM) by adding a supervised layer

o GSOM. Hsu model is an online semi-supervised model that

an perform clustering and classification of data very well ( Hsu

 Halgamuge, 2008 ). Their model uses labeled data to control

he Expansion nodes in the SOM. This model employs an innova-

ive method to determine the effective parameters in the objective

unction ( Hsu & Halgamuge, 2008 ). In the present paper, we rede-

ne the objective function and then discuss the determination of

he optimum effective parameters. 

Table 1 summarizes the significant semi-supervised methods. In

his table, type of algorithm, advantages, and unresolved problems

f each algorithm are presented. 

The main contributions of the proposed method are: 

• Redefining robust cost function of online Semi-Supervised

GSOM by half quadratic, 
• An adaptive online semi-supervised GSOM algorithm is pro-

posed. 

The remainder of this paper is organized as follows: In

ection 2 , the primary concepts for the proposed algorithm are

xplained. In Section 3 , the proposed algorithm is presented. In

ection 4 , the results of the performed experiments on different

ata are shown. In the final section, the conclusion and guidelines

or future works are expressed. 


A. Mehrizi et al. / Expert Systems With Applications 105 (2018) 23–33 25 

Table 1 

Summary of Semi-supervised Algorithms. 

Proposed algorithm Advantage Unsolved problems 

Farid et al. 
√ 

Semi Supervised learning ❖ Slow execution speed because of creating the decision tree 

Leite et al. 
√ 

Fuzzy ❖ Slow execution speed √ 
Resistant data types ❖ Not improve clustering by using the feedback of labeled data 

Macario et al. 
√ 

Fuzzy clustering ❖ Offline method √ 
Adaptive distance 

Shen et al. 
√ 

online ❖ Not adaptive √ 
Based on SOM ❖ Not robust 

Halkidi et al. 
√ 

Data binding ❖ Receiving data in chunk format 

Cong et al. 
√ 

Metric learning ❖ Receiving in chunk format 

❖ Slow execution speed 

Li et al. 
√ 

Cooperative extreme learning ❖ Not suitable for online data 

❖ Slow execution speed because of cooperative learning 

Zhang et al. 
√ 

Co-training ❖ Offline 

❖ Slow (because used active learning to determine appropriate 

data for labeling) 

Dornaika et al. 
√ 

Flexible method ❖ Offline √ 
Graph-based ❖ Adjust many parameters 

Beyer et al. 
√ 

Based on neural gas ❖ Not adaptive √ 
Online ❖ Not robust 

Maximo et al. 
√ 

Based on neural gas ❖ Not adaptive √ 
Online ❖ Not robust 

Allahyar et al. 
√ 

Online ❖ Not adaptive √ 
Based on GSOM ❖ Not robust 

Hsu et al. 
√ 

online ❖ Determine the parameters of the GSOM innovatively √ 
Based on GSOM √ 
Improve clustering by using feedback of labeled data ❖ Not robust 

Proposed method of the 

paper 

√ 
Online 

√ 
Based on GSOM √ 
Adaptive √ 
Robust 

Fig. 1. Semi-supervised GSOM Structure ( Hsu & Halgamuge, 2008 ). 

2

 
G  

s  

s  

n  

t

 
i  

n  

c  

t  

w  

t  

t  

t  

d  

m

�

 
t  

d  

t

ζ

P

 
t  

d  

b  

m

w  

s  

b

δ

. Preliminary concept 

In this section, the preliminary concepts of semi-supervised

SOM are expressed. GSOM is a method that is able to show the

tructure of data ( Fritzke, 1994 ). Hsu and colleagues added the

emi-supervised ability by adding a supervised layer on top of this

etwork ( Hsu & Halgamuge, 2008 ). Fig. 1 shows the structure of

his model. 

In the presented semi-supervised model, the number of nodes

n the upper layer is equal to the number of classes (M) and the

umber of nodes in the bottom layer represents the number of
lusters (C). There is a full connection of weights between the

wo layers that is indicated as w c,m . Its purpose is to train the

eights in such a way that by entering data into the cluster layer,

he nearest node is activated and then the suitable class is de-

ermined for it. Based on the weights between the two layers,

raining of weights between the two layers is done by the labeled

ata and according to Eq. (1) for each data sample ( Hsu & Halga-

uge, 2008 ). 

w c,m 
k 

= β ×
(
ζ m 

k 
− P S m 

k 

)
× o c 

k 
(1) 

In Eq. (1) , ζ m 
k 

is actual label for datum k that is in class m and

he P S m 
k 

is the estimated value of the semi-supervised GSOM for

atum k that is in class m ; where ζ m 
k 

and P S m 
k 

are defined respec-

ively by Eqs. (2) and (3) ( Hsu & Halgamuge, 2008 ). 

m 
k 

= 
{

0 l k � = ˆ m k or missing label 
1 l k = ˆ m k 

(2) 

 S m 
k 

= 
C ∑ 

c=1 
w c,m 

k 
× o c 

k 
m = { 1 , 2 , . . . , M } (3) 

In Eqs. (1) and (3) , o c 
k 

the function determines the similarity of

he entered data with cluster nodes and is known as the activation

egree. The value of activation degree for each c node is calculated

y using a Gaussian function that is given in Eq. (4) ( Hsu & Halga-

uge, 2008 ). 

o c 
k 

= exp 
(

− ‖ x k −v c ‖ 
2 

( δc k ) 
2 

)
c ∈ { 1 , . . . , C } (4) 

here δc 
k 

is average distance of all neighbor’s nodes of c for k th

ample and is calculated by (5) and | Ne ( c )| in (5) represents num-

er of neighbor’s node c ( Hsu & Halgamuge, 2008 ). 

c 
k 

= 
∑ 

k ∈ Ne ( c ) ‖ v k − v c ‖ 
| Ne ( c ) | (5) 


26 A. Mehrizi et al. / Expert Systems With Applications 105 (2018) 23–33 

 
w
 

E

 
w

 
fi  

l  

c  

r  

t  

v  

δ  
s

J  

 (
 

m  

fi

δ  

w  

 
a

3

 
r

∇  
 

o

More information about how growing GSOM in ( Hsu & Halga-

muge, 2008 ). 

3. Proposed method 

Since the number of labeled data may be few, optimal opera-

tion of semi-supervised algorithms is very important. Although the

method proposed by Hsu is semi-supervised and online, it faces

the following problems: 

• The objective function is optimized only based on the weight of

classifier layer, but in fact, the objective function also depends

on the parameters of the shape and location of the nodes in the

clustering layer and the degree of activation. 
• Growing of GSOM is not optimized, because the activation func-

tion plays an important role in amount of similarity of data for

each node, but Hsu algorithm adjusts this parameter by the av-

erage distance to neighbor nodes. This innovative solution is a

local solution and this method is not optimal by changing data.
• It suffers the problem of instability versus noisy data. 

Accordingly, to overcome the mentioned problems, the objec-

tive function was redefined and resolved in the present work. 

3.1. Definition of the problem 

According to Section 2 , in the semi-supervised GSOM error for

each data k is defined as: 

e k = 
M ∑ 

m =1 

(
l m 
k 

− ps m 
k 

)
(6)

And cost function based on the expectation of error function is

defined as: 

g ( e k ) = 
N ∑ 

k =1 
exp 

(
−ηe 2 

k 

)
(7)

J = max 
w, δ, v 

N ∑ 
k =1 

exp 
(
−ηe 2 

k 

)
(8)

The half-quadratic technique Rockafellar, 1970 ; He, Hu, Zheng,

& Guo, 2010 ; Zhang et al., 2014 ) is often used to solve the nonlin-

ear optimization problem ( He, Hu, Zheng, & Kong, 2011 ). Therefore,

we derived an algorithm to solve ( (8) based on the half quadratic.

Based on the theory of convex conjugated functions, we can solve

(8) as following proposition. 

Proposition 1. There exists a convex conjugated function φ of g ( e )
such that 

g ( e ) = sup 
p 

(
ηe 2 p − ϕ ( p ) 

)
(9)

where convex function φ( p ) is defined as φ( p ) = −p log ( − p ) + p
. Therefore, J is rewritten as follows: 

J = max 
w,δ, v 

N ∑ 
k =1 

(
ηe 2 

k 
p k − ϕ ( p k ) 

)
(10)

According to Proposition 1 , we notice that when [ w , δ, v ] is
fixed, the following equation holds: 

max 
w,δ, v 

J ( w, δ, v ) = max 
w,δ, v ,p 

ˆ J ( w, δ, v , p ) (11)

Then, the alternative solution is used: 

ste p 1 : max 
p 

ˆ J ( w, δ, v , p ) ⇒ p = −exp 
(
−ηe 2 

)
. 

ste p 2 : max 
w,δ, v 

ˆ J ( w, δ, v , p ) ⇒ max 
w, δ, v 

J = max 
w,δ, v 

N ∑ 
k =1 

ηe 2 
k 

p k . 
By defining q k = −p k = exp ( −ηe 2 k ) : 

min 
, δ, v 

J = min 
w,δ, v 

N ∑ 
k =1 

ηe 2 
k 
q k (12)

Let assume η is constant and expand (12) as: 

q. ( 8 ) ⇒ J = 
∑ n 

k =1 

⎛ 
⎝ 

( 
M ∑ 

m =1 

(
l m 
k 

− ps k m 
)) 2 

× q k 

⎞ 
⎠ 

Eq. 3 ⇒ J = 
∑ n 

k =1 

((∑ M 
m =1 

(
l m 
k 

−
(∑ C 

c=1 
w c,m 

k 
× o c 

k 

)))2 
× q k 

)

= 
∑ n 

k =1 

((∑ M 
m =1 

∑ C 
c=1 

(
l m 
k 

− w c,m 
k 

× o c 
k 

))2 
× q k 

)
Eq. 4 ⇒ J = 

∑ n 
k =1 

((∑ M 
m =1 

∑ C 
c=1 

(
l m 
k 

− w c,m 
k 

× exp 
(

− ‖ x k − v c ‖ 
2 

σ 2 c 

)))2 
× q k 

) 
(13)

Therefore: 

min 
,δ, v ,p 

J ( w, σ, v ) = min 
w,δ, v ,p 

∑ n 
k =1 

((∑ M 
m =1 

∑ C 
c=1 

(
l m 
k 

− w c,m 
k 

× exp 
(

− ‖ x k − v c ‖ 
2 

σ 2 c 

)))2 
× q k 

) 
(14)

Based on (14) , the effective parameters include weight classi-

er layer ( w ), the shape and location of the nodes in the clustering

ayer ( v ) and the width of activation degree function ( δ). It is so
lear that the parameter δ is important because it affects w di-
ectly. Moreover, the calculation of the optimal value of parame-

er v in the objective function practically is impossible and will be

ery time consuming. Therefore, if the optimal value of parameter

can be determined, other parameters will be optimal. In conclu-

ion, it can be derived: 

 ( w, δ, v ) ∼= J ( w, δ) (15)
Next, we use the gradient search method in the following form:

w 
d 

)
k +1 

= 
(

w 
d 

)
k 

− μ ˆ ∇ ( 
w 
d 

) 
k 

(16)

Now, Eq. (16) is simplified in two steps: one step is an adjust-

ent of clustering layer and the other is the adjustment of classi-

cation layer. For clustering layer, it is: 

k +1 = δk − μδ ˆ ∇ δk (17)
And in classification layer, 

 k +1 = w k − μw ̂  ∇ w k (18)
By derivation of each variable, the minimal cost function is

chieved by the corresponding variable ( μδ and μw ). 

.2. Parameters adaption 

To obtain the adaptive value of δ, we use a least mean algo-
ithm for Eq. (17) as: 

ˆ 
 δk 

= ∂ J k 
∂ δk 

= 2 e k 
∂ e k 
∂ δk 

(19)

The optimal value of δ is calculated through a derivation of the
bjective function (14) based on δ as follows: 

∂ e k 
∂ δk 

= 
∂ 

(∑ n 
k =1 

∑ M 
m =1 

∑ C 
c=1 

(
l m 
k 

− w c,m 
k 

ex p 

(
− ‖ x k −v c ‖ 

2 

σ 2 c 

))
× q k 

))
∂ δk 


A. Mehrizi et al. / Expert Systems With Applications 105 (2018) 23–33 27 

 
n

 
δ

∇

 
s  

s  

b

∇
w

 
n

w

3

 
b  

e

e

⇒

3

 
fi  

f

m

 
w

l

 
t

η

3

 
u  

t

 
the width degree of activation of each node. 
= 0 −
∂ 
(∑ n 

t=1 

(∑ M 
m =1 

∑ C 
c=1 

(
w c,m 

k 
ex p 

(
− ‖ x t −v c ‖ 

2 

σ 2 c 

)))
× q k 

)
∂ δk 

(20) 

If c is replaced with Best Matching Node (BMN of winning
ode) then (20) is rewritten as follows, 

= − ∂ 
∂ δk 

(∑ n 
k =1 

(∑ M 
m =1 

(
w BMN,m 

k 
× exp 

(
− ‖ x k − v BMN ‖ 

2 

σ 2 
BMN 

))
× q k 

))

= −

∑ n 
k =1 

∑ M 
m =1 

(
2 × w BMN,m 

k 
× exp 

(
− ‖ x k −v BMN ‖ 

2 

σ 2 
BMN 

)
×

(
‖ x k − v BMN ‖ 2 

))
× q k

δ3 
BMN 

(21)

So, according to Eq. (19) , δ is updated as follows: 

k +1 = δk + 4 μδ

×
∑ n 

k =1 
∑ M 

m =1 
(
l m 
k 

− P S m 
)

×
(

w BMN,m 
k 

× exp 
(
− ‖ x k −v BMN ‖ 

2 

σ 2 
BMN 

)
×

(
‖ x k − v BMN ‖ 2 

))
× q k 

δ3 
BMN 

δ = 4 μδ

×
∑ n 

k =1 
∑ M 

m =1 

(
w BMN,m 

k 
×

(
l m 
k 

− P S m 
)
×exp 

(
−‖ x k −v BMN ‖ 

2 

σ 2 
BMN 

)
×

(
‖ x k −v BMN ‖ 2 

))
q k 

δ3 
BMN 

(22) 

The coefficient μδ affects the convergence speed of δ. For the
ake of simplicity, the learning rate can be assumed as a con-

tant and positive value that is reduced by time and exceeding the

oundary confidence. 

To obtain adaptive w , likewise the δ, we have: 

ˆ 
 w t 

(
e 2 

k 

)
= ∂ J k 

∂ w k 
= 2 e k 

∂ e k 
∂ w k 

(23) 

here 

∂ e k 
∂ w k 

= 
∂ 
( ∑ n 

k =1 
∑ M 

m =1 
∑ C 

c=1 

(
l m 
k 

− w c,m 
k 

ex p 

(
− ‖ x k −v c ‖ 

2 

σ 2 c 

))
× q k 

))
∂ w k 

= 0 −
∂ 
( ∑ n 

k =1 
∑ M 

m =1 
∑ C 

c=1 

(
w c,m 

k 
×ex p 

(
−‖ x k −v c ‖ 

2 

σ 2 c 

))
× q k 

))
∂ w k 

(24) 

If c is replaced with Best Matching Node (BMN of winning

ode) then (24) is rewritten as follows: 

= −
∂ 
((∑ n 

k =1 

(∑ M 
m =1 

(
w BMN,m 

k 
× exp 

(
− ‖ x k −v BMN ‖ 

2 

σ 2 
BMN 

))
× q k 

)))
∂ w k 

= −
∑ n 

k =1 

((
exp 

(
− ‖ x k − v BMN ‖ 

2 

σ 2 
BMN 

))
× q k 

)
(25) 

Therefore, w is updated as follows: 

 k +1 = w k + 2 μw ×
n ∑ 

k =1 

M ∑ 
m =1 

(
l m 
k 

− ps m 
k 

)
× exp 

(
− ‖ x k − v BMN ‖ 

2 

σ 2 
BMN 

)
× q k 

∇w = 2 μw ×
∑ n 

k =1 

∑ M 
m =1 

(
l m 
k 

− ps m 
k 

)
× exp 

(
− ‖ x k −v BMN ‖ 

2 

σ 2 
BMN 

)
× q k (26) 

.3. Parameters adaption through back propagation method 

In addition to Section 3.1 , the back propagation method can also

e used for error estimation. In the back propagation method, the

rror is defined as Eq. (27) . 

 k = 
M ∑ 

m =1 

(
l m 
k 

− ps m 
k 

)
× w m 

k 
(27) 

Hence, (21) id rewritten as: 

∂ e k 
∂ δk 

= 
∂ 

( ∑ n 
k =1 

∑ M 
m =1 

∑ C 
c=1 

(
l m 
k 

− w c,m 
k 

× exp 
(

− ‖ x k −v c ‖ 
2 

σ 2 c 

))
× w c,m 

k 
× q k 

))
∂ δk 
After simplification: 

∂ e k 
∂ δk 

= −
n ∑ 

k =1 

M ∑ 
m =1 

(
2 ×

(
w BMN,m 

k 

)2 × exp 
(

− ‖ x k − v c ‖ 
2 

σ 2 
BMN 

)
×
(

‖ x k − v c ‖ 2 
σ 3 

BMN 

))
× q k 

)
⇒ ∇δ = 4 μδ

×
∑ n 

k =1 
∑ M 

m =1 

((
w BMN,m 

k 

)2 × (l m 
k 

− ps m 
k 

)
× exp 

(
− ‖ x k −v BMN ‖ 

2 

σ 2 
BMN 

)
×

(
‖ x k − v BMN ‖ 2 

))
q k 

δ3 
BMN 

(28) 

Also, for w, it is as follows: 

∂ e k 
∂ w k 

= 
∂ 
( ∑ n 

k =1 
∑ M 

m =1 
∑ C 

c=1 
(
l m 
k 

− P S m 
)

× w k m × q k 
))

∂ w k 
(29) 

After simplification, we have: 

∂ e k 
∂ w k 

= −2 ×
∑ n 

k =1 

∑ M 
m =1 

∑ C 
c=1 

w BMN,m 
k 

× exp 
(

− ‖ x k − v BMN ‖ 
2 

σ 2 
BMN 

)
× q k 

 ∇w = 4 μw ×
∑ n 

k =1 

∑ M 
m =1 

(
l m 
k 

− ps m 
k 

)
×

(
w BMN,m 

k 

)2 
× exp 

(
− ‖ x k − v BMN ‖ 

2 

σ 2 
BMN 

)
× q k (30) 

.4. ƞ tuning 

In this section, we discuss adaption of the ƞ parameter. We de-
ne a constraint for ƞ parameter over the minimization problem as

ollows: 

in 
ηi 

∑ N 
i =1 

η2 i e 
2 
i q i , 

∑ 
ηi = p, ηa i ∈ [ 0 , 1 ] (31) 

Producing the Lagrange equation by adding the penalty term,

e have: 

 = 
N ∑ 

i =1 
η2 i e 

2 
i q i − λ

(∑ 
ηi − p 

)
(32) 

Then, we obtain ƞi as follows: 

∂l 

∂ ηi 
= 0 ⇒ ηi = 

λ

2 e 2 
i 
q i 

∑ 
ηi = p ⇒ ηi = 

p ∑ n 
j=1 

e 2 
i 
q i 

e 2 
j 
q j 

(33)

To deal with division by zero problems, we add ξ parameter to
he denominator of the equation as follows: 

i = 
p ∑ n 

j=1 
e 2 

i 
q i 

e 2 
j 
q j 

+ ξ
(34) 

.5. Structure of the proposed method 

As shown in Fig. 2 , the proposed algorithm receives labeled and

nlabeled data continuously. Depending on the type of the data,

wo events may happen: 

• Unlabeled data are received: GSOM performs clustering and up-

dates weight of nodes in the clustering layer. In this study, we

assumed that most of data sizes are unlabeled. 
• Data with known labeled is received: First, the winner node in

the clustering layer is identified and then the label of data is

estimated by the system based on this node. The system error

is calculated from the difference between actual Label and esti-

mated Label system. This error, which updates weight between

layers of clustering and classification, is applied to determine


28 A. Mehrizi et al. / Expert Systems With Applications 105 (2018) 23–33 

Fig. 2. The proposed model for creation adaptive Semi-supervised GSOM. 

Fig. 3. Synthetic evaluated datasets: (a) four cluster (b) two moon (c) spiral (d) two ring. 

 
Table 2 

Abbreviation and title of proposed and state of the art methods. 

Abbreviated name Title of method 

ASSGSOM Adaptive Semi Supervised GSOM 

BASSGSOM Back propagation Adaptive Semi Supervised GSOM 

SSGSOM Hsu’s Semi-supervised GSOM{Hsu, 2008 #20} 

CS2GS Constrained Semi-Supervised GSOM{Allahyar, 2015 #2} 

HSS Halkidi’s Semi Stream algorithm{Halkidi, 2012 #18} 

P  
4. Performance evaluation and tests 

In this section, the efficiency of the proposed algorithm is eval-

uated. The datasets used for the evaluation include a synthetic and

UCI repository, 2 which are detailed in the next sections. 

Three online algorithms were selected among the state of the

art methods mentioned in Section 1 . Table 2 shows title and ab-

breviation for the two proposed method and state of the art meth-

ods. 

The measures considered for assessing the proposed method

and those from the literature include Accuracy and F-measure. The

Accuracy measure calculates based on (35) . In (35) , TP, TN, FP, and

FN are true positive, true negative, false positive, and false nega-

tive, respectively. 

Ac c q = 
T P 

T P + T N + F P + F N (35)
2 http://archive.ics.uci.edu/ml/ R  
Also, Precision and Recall measures are defined as follows: 

 R = T P 
T P + F P (36)

E = T P 
T P + F N (37)


A. Mehrizi et al. / Expert Systems With Applications 105 (2018) 23–33 29 

Fig. 4. Result on Two Moon dataset. 

 
d

F

 
a  

a  

c

G

 
c  

t  

r  

t  

v  

s  

o  

t

 
t  

T  

a  

i  

t  

l  

r  

d

4

 
s  

s  

d  

f  

r

d

 
t  

t  

l  

o  

s  

i

 
w  

Fig. 5. Result on Four Cluster dataset. 

Fig. 6. Result on Two Ring dataset. 

Fig. 7. Result on Spiral dataset. 

T  

S  

s  

A  

A

4

 
U  

(  

m

 
d

 
S  
And then F -measure, which is based on precision and recall, is

efined as follows: 

 M = 2 × P R q × R E q 
P R q + R E q 

(38) 

For all algorithms that are based on GSOM (i.e., SSGSOM, CS2GS,

nd both proposed methods), the scale factor (SF) is considered as

 constant value and based on growing threshold of GSOM is cal-

ulated as (39) . Parameter d in (39) represents data dimension. 

T = −d ∗ log ( SF ) (39) 
Furthermore, the regularization parameter for GSOM layer and

lassification layer is very influential on the results. So, in all of

he experiments, constant values of 0.4 and 0.6 were used for the

egularization parameter for GSOM and Classification Layer, respec-

ively. Besides, the initial value of w (weight of classification layer),

 (weight of GSOM node) and other settings related to GSOM, are

elected randomly and are the same for all algorithms. The order

f data entry, which affects the performance of the algorithms is

he same for all of them. 

The test scenario is that at first each data set is divided into

he train and test subsets considering a 10-folds cross-validation.

hen, 5% of training data is labeled randomly and all the methods

re trained using them. Next, the performance of each algorithm

s evaluated using test data. This process is replicated 5 times and

he average of results is reported. In the following, the number of

abeled training data is increased by 5% and the test scenario is

epeated. This procedure will continue until the number of labeled

ata is 30%. 

.1. The evaluation results on synthetic datasets 

In this section, the performances of the proposed methods and

tate of the art methods are shown on synthetic datasets. The

tructures of the synthetic datasets are shown in Fig. 3 . All these

ata sets have two classes. Also, the number of data in two moons,

our clusters, two ring and spiral sets are 258, 160, 237, and 30 0 0,

espectively. 

Figs. 4–7 present the obtained results using the synthetic 

atasets based on F-measure. 

As shown in Figs. 4–7 , ASSGSOM method is superior compared

o other methods. The closest results to this method are those ob-

ained by SSGSOM method. Moreover, a statistical t -test was estab-

ished to determine whether the difference between these meth-

ds is statistically significant. Table 3 , presents the t -test compari-

on between ASSGSOM and SSGSOM. In this test, 15% of the data

s labeled and a number of iteration is 35. 

In Table 3 , if the value of t -test is 1, it means that the method

ith a higher average is statistically superior. For example, in the
wo Ring dataset, the value of Accuracy t -test column for ASSG-

OM is 1. It means that Accuracy result of ASSGSOM is statistically

uperior to the SSGSOM because the average of the accuracy of the

SSGSOM algorithm (98%) from SSGSOM (93%) is higher, then the

SSGSOM algorithm is superior. 

.2. The evaluation results on UCI datasets 

In this section, the performance of the proposed methods on

CI datasets is evaluated. Several datasets from UCI repository

 Blake & Merz, 1998 ) were selected to evaluate the proposed

ethod. Table 4 shows the details of each dataset. 

In Figs. 8–13 , results on a part of UCI datasets are shown and

etailed results are reported in Table 5 . 

Results in Figs. 8–13 indicate the relative superiority of ASSG-

OM compared to state of art methods. As can be seen, when the


30 A. Mehrizi et al. / Expert Systems With Applications 105 (2018) 23–33 

Table 3 

The t -test results between ASSGSOM and SSGSOM. 

Dataset method Accuracy (%) F -Measure (%) Accuracy t -test F -Measure t -test 

Two Moon ASSGSOM 96 95 1 1 

SSGSOM 88 87 

Four Cluster ASSGSOM 94 94 1 1 

SSGSOM 86 85 

Two Ring ASSGSOM 98 98 1 1 

SSGSOM 93 92 

Spiral ASSGSOM 88 87 1 1 

SSGSOM 86 85 

Fig. 8. Result on Spiral dataset. 

Fig. 9. Result on Iris dataset. 

Fig. 10. Result on Wine dataset. 

Fig. 11. Result on E.coli dataset. 


A. Mehrizi et al. / Expert Systems With Applications 105 (2018) 23–33 31 

Fig. 12. Result on Vehicle dataset. 

Fig. 13. Result on Vowel dataset. 

Table 4 

Detail of used UCI repository. 

Dataset name Number of class Number of features Number of data 

Soybean 4 35 47 

Iris 3 4 150 

Wine 3 13 178 

Sonar 2 60 208 

E.coli 6 6 332 

Ionosphere 2 34 351 

WDBC 2 30 569 

Scale 3 4 625 

Vehicle 4 18 846 

Vowel 11 10 990 

n  

l

 
u  

3

 
i  

f  

S  

s  

S

Table 5 

The results of the proposed methods and state of art me

Dataset/Method Measure ASSGSOM (%) BAS

Soybean F -Measure 94 96%

Accuracy 87 89%

Iris F -Measure 94 90%

Accuracy 94 90 

Wine F -Measure 70 63%

Accuracy 71 63%

Sonar F -Measure 60 67%

Accuracy 61 67%

E.coli F -Measure 83 73%

Accuracy 84 75%

Ionosphere F -Measure 90 85%

Accuracy 91 87%

WDBC F -Measure 90 87%

Accuracy 90 87%

Scale F -Measure 75 70%

Accuracy 84 76%

Vehicle F -Measure 51 50%

Accuracy 52 50%

Vowel F -Measure 88 71%

Accuracy 88 72%
umber of labeled data is increased the proposed method shows

ess fluctuate and it becomes more stable. 

Results in different UCI dataset are summarized in Table 5 . Eval-

ation is done in the condition that the number of labeled data is

0% of the total data in the dataset. 

The results in Table 5 show that when the number of Attributes

s high (such as Soybeans and Sonar), ABSSGSOM is better. Except

or the results on Iris dataset, in the other datasets ASSGSOM and

SGSOM methods are superior compared to other methods. The re-

ults of these two methods are very close to each other but ASSG-

OM is superior totally. 
thods on UCI datasets. 

SGSOM SSGSOM (%) CS2GS (%) HSS (%) 

 93 85 82 

 85 70 80 

 97 90 90 

97 90 89 

 68 70 67 

 70 71 67 

 57 57 51 

 58 58 51 

 81 70 64 

 82 72 67 

 88 86 80 

 89 87 81 

 88 87 80 

 88 87 80 

 71 68 55 

 84 69 56 

 52 45 42 

 51 45 42 

 88 64 60 

 87 65 61 


32 A. Mehrizi et al. / Expert Systems With Applications 105 (2018) 23–33 

Fig. 14. Effect of the noise on the four datasets selected. 

Fig. 15. Resistance analysis of the accuracy of the all evaluated datasets. 

 
5

 
t  

q  

m  

a  

a  

t  

s

 
o  

b  

c  

r  

a

R

A  
4.3. The effect of noise 

In this section, noise effect on the proposed methods is eval-

uated. Fig. 14 presents the results for the datasets contaminated

with noise. Eq. (40) is used for calculating the SNR. 3 In this for-

mula, D is the domain of signal. 

SN R dB = 20 lo g 10 
(

D signal 

D noise 

)
(40)

4.4. Resistance analysis 

Resistance analysis shows the stability of the algorithm in dif-

ferent condition ( Geng, Zhan, & Zhou, 2005 ). Eq. (41) presents how

to calculate resistance analysis. Fig. 15 exhibits resistance analysis

for the discussed dataset. 

R A q = 
Ac c q ∀ j ∈ { 1 , 2 , . . . , Q } (41)
max Ac c j 

3 Signal to noise ratio. 

 
A  

A

. Conclusion and future works 

In this paper, a comprehensive robust mathematical formula-

ion of the semi-supervised GSOM was proposed and then half

uadratic (HQ) was used to solve it. Furthermore, an adaptive

ethod was proposed to adjust activation degree optimally for

chieving a better performance of semi-supervised GSOM. Evalu-

tion in noisy conditions and on different datasets suggests that

he proposed method has superiority and stability compared to the

tate of the art methods. 

In the future, we add a regular term to cost function to avoid

ver-fitting the complex dataset. Also, we plan to develop a ro-

ust active semi-supervised SSGSOM. Using this active learning, we

an determine which data are important for increasing the accu-

acy. Therefore, a higher degree of accuracy can be achieved with

 smaller number of labeled data. 

eferences 

lahakoon, D. , Halgamuge, S. K. , & Srinivasan, B. (20 0 0). Dynamic self-organizing

maps with controlled growth for knowledge discovery. IEEE Transactions on Neu-
ral Networks, 11 (3), 601–614 . 

llahyar, A. , Yazdi, H. S. , & Harati, A. (2015). Constrained semi-supervised growing
self-organizing map. Neurocomputing, 147 , 456–471 . 

lpaydin, E. (2004). Introduction to machine learning . MIT Press . 


A. Mehrizi et al. / Expert Systems With Applications 105 (2018) 23–33 33 

B  

B  
 

C  

 
C  

 
D  

F  
 

F  

G  
 

H  

 
H  

H  

 
H  

H  

H  
 

K  

L

 
L  

 
L  
 

M  

 
M  

 
M
 

R  

S
 

Z  

 
Z  
eyer, O. , & Cimiano, P. (2012). Online semi-supervised growing neural gas. Interna-
tional Journal of Neural Systems, 22 (05), 1250023 . 

lake, C. L., & Merz, C. J. (1998). UCI repository of machine learning databases,
Irvine, CA: Department of Information and Computer Science, University of Cal-

ifornia , 55. [http://www.ics.uci.edu/ ∼mlearn/MLRepository.html] . 
hen, S. , Li, S. , Su, S. , Cao, D. , & Ji, R. (2014). Online semi-supervised compressive

coding for robust visual tracking. Journal of Visual Communication and Image
Representation, 25 (5), 793–804 . 

ong, Y. , Liu, J. , Yuan, J. , & Luo, J. (2013). Self-supervised online metric learning with

low rank constraint for scene categorization. IEEE Transactions on Image Process-
ing, 22 (8), 3179–3191 . 

ornaika, F. , & El Traboulsi, Y. (2016). Learning flexible graph-based semi-supervised
embedding. IEEE Transactions on Cybernetics, 46 (1), 206–218 . 

arid, D. M. , Zhang, L. , Hossain, M. A. , Rahman, C. M. , Strachan, R. , Sexton, G. ,
et al. (2013). An adaptive ensemble classifier for mining concept drifting data

streams. Expert Systems with Applications, 40 (15), 5895–5906 . 

ritzke, B. (1994). Growing cell structures—a self-organizing network for unsuper-
vised and supervised learning. Neural Networks, 7 (9), 1441–1460 . 

eng, X. , Zhan, D.-C. , & Zhou, Z.-H. (2005). Supervised nonlinear dimensionality re-
duction for visualization and classification. IEEE Transactions on Systems, Man,

and Cybernetics, Part B: Cybernetics, 35 (6), 1098–1107 . 
alkidi, M. , Spiliopoulou, M. , & Pavlou, A. (2012). A semi-supervised incremental clus-

tering algorithm for streaming data advances in knowledge discovery and data min-

ing (pp. 578–590). Springer . 
e, R. , Hu, B. G. , Zheng, W. S. , & Guo, Y. (2010). Two-Stage Sparse Representation

for Robust Recognition on Large-Scale Database. In AAAI: Vol. 10 (p. 1) . 
e, R. , Hu, B.-G. , Zheng, W.-S. , & Kong, X.-W. (2011). Robust principal component

analysis based on maximum correntropy criterion. IEEE Transactions on Image
Processing, 20 (6), 1485–1494 . 

su, A. , & Halgamuge, S. K. (2008). Class structure visualization with semi-super-

vised growing self-organizing maps. Neurocomputing, 71 (16), 3124–3130 . 
uang, G.-B. , Li, M.-B. , Chen, L. , & Siew, C.-K. (2008). Incremental extreme learning

machine with fully complex hidden nodes. Neurocomputing, 71 (4), 576–583 . 
uang, G.-B. , Zhu, Q.-Y. , & Siew, C.-K. (2004). Extreme learning machine: A new
learning scheme of feedforward neural networks. In Proceedings of the IEEE in-

ternational joint conference on neural networks . 
ohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78 (9),

1464–1480 . 
eite, D. , Costa, P. , & Gomide, F. (2010). Evolving granular neural network for semi–

supervised data stream classification. In Proceedings of the international joint
conference on neural networks (IJCNN) . 

i, K. , Zhang, J. , Xu, H. , Luo, S. , & Li, H. (2013). A semi-supervised extreme learn-

ing machine method based on co-training  . Journal of Computational Information
Systems, 9 (1), 207–214 . 

u, K. , Wang, Q. , Xue, J. , & Pan, W. (2014). 3D model retrieval and classification
by semi-supervised learning with content-based similarity. Information Sciences,

281 , 703–713 . 
acario, V. , & de Carvalho, F. d. A. (2012). An adaptive semi-supervised fuzzy clus-

tering algorithm based on objective function optimization. In Proceedings of the

IEEE international conference on fuzzy systems (FUZZ-IEEE) . 
artinetz, T. M. , Berkovich, S. G. , & Schulten, K. J. (1993). Neural-gas’ network for

vector quantization and its application to time-series prediction. IEEE Transac-
tions on Neural Networks, 4 (4), 558–569 . 

aximo, V. R. , Quiles, M. G. , & Nascimento, M. C. (2014). A consensus-based semi–
supervised growing neural gas. In Proceedings of the international joint conference

on neural networks (IJCNN) . 

ockafellar, R. (1970). Convex analysis . Princeton, NJ: Princeton University Press
Google Scholar . 

hen, F. , Yu, H. , Kamiya, Y. , & Hasegawa, O. (2010). An online incremental semi–
supervised learning method. Journal of Advanced Computational Intelligence and

Intelligent Informatics, 14 (6), 593–605 . 
hang, Y. , Wen, J. , Wang, X. , & Jiang, Z. (2014). Semi-supervised learning com-

bining co-training with active learning. Expert Systems with Applications, 41 (5),

2372–2378 . 
hu, X. , & Goldberg, A. B. (2009). Introduction to semi-supervised learning. Synthesis

Lectures on Artificial Intelligence and Machine Learning, 3 (1), 1–130 .