Distance difference and linear programming nonparallel plane classifier
Distance difference and linear programming nonparallel plane classifier
Qiaolin Ye a,⇑, Chunxia Zhao a, Haofeng Zhang a, Ning Ye b
a School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, China
b School of Information Technology, Nanjing Forestry University, Nanjing, China
a r t i c l e i n f o
Keywords:
GEPSVM
LSTSVM
TWSVM
Standard eigenvalues
Feature selection
Input features
Kernel functions
a b s t r a c t
We first propose Distance Difference GEPSVM (DGEPSVM), a binary classifier that obtains two nonparallel
planes by solving two standard eigenvalue problems. Compared with GEPSVM, this algorithm does not
need to care about the singularity occurring in GEPSVM, but with better classification correctness. This
formulation is capable of dealing with XOR problems with different distribution for keeping the genuine
geometrical interpretation of primal GEPSVM. Moreover, the proposed algorithm gives classification cor-
rectness comparable to that of LSTSVM and TWSVM, but with lesser unknown parameters. Then, the reg-
ularization techniques are incorporated to the TWSVM. With the help of the regularized formulation, a
linear programming formation for TWSVM is proposed, called FETSVM, to improve TWSVM sparsity,
thereby suppressing input features. This means FETSVM is capable of reducing the number of input fea-
tures, for linear case. When a nonlinear classifier is used, this means few kernel functions determine the
classifier. Lastly, this algorithm is compared on artificial and public datasets. To further illustrate the
effectiveness of our proposed algorithms, we also apply these algorithms to USPS handwritten digits.
� 2011 Elsevier Ltd. All rights reserved.
1. Introduction
Eigenvalue based techniques are attractive for the classification
of very large sparse datasets (Guarracino, Cifarelli, Seref, & Parda-
los, 2007) such as generalized proximal SVM (GEPSVM for short)
(Mangasarian & Wild, 2006). GEPSVM obtains each of the nonpar-
allel planes by solving the eigenvector corresponding to a smallest
eigenvalue of a generalized eigenvalue problem, such that each
plane is as close as possible to the samples for its class and mean-
time as far as possible from the samples for the other classes (Man-
gasarian & Wild, 2006). The edges of two-class GEPSVM lie in its
lower computational complexity and its better classification per-
formance in terms of solving XOR problems with respect to stan-
dard SVM that find one plane that separates the two classes. In
Mangasarian and Wild (2006), Mangasarian et al. presented a sim-
ple ‘‘cross planes’’ example that is a generalization of the XOR
example, which indicated the effectiveness of GEPSVM over PSVM
and SVM. Fig. 1 in Mangasarian and Wild (2006) demonstrates
GEPSVM has classification correctness of 100% in XOR case. Re-
cently, a lot of GEPSVM-based algorithms have been proposed. To
improve the generalization of GEPSVM, Jayadeva et al. proposed
Fuzzy GEPSVM (FGEPSVM) given its multi-category formulation.
In 2007, Guarracino et al. (2007) introduced a new regularization
technique to GEPSVM for reducing the time complexity of
GEPSVM, but with two unknown parameters in linear case. These
algorithms obtain two planes by solving generalized eigenvalue
problems as GEPSVM does. However, for the symmetric matrices
occurring in these algorithms such as H and M in the formulation
(5) and (6), if both are semi-positive, an ill-defined operation will
be obtained. Moreover, these algorithms weaken the genuine geo-
metrical interpretation of the nonparallel plane classifier due to the
adoption of regularization term that improves their generalization.
Recently, a twin SVM algorithm (TWSVM for short), proposed by
Jayadeva et al., was published in TPAMI (Jayadeva & Chandra,
2007). This algorithm, which is in the spirit of GEPSVM, obtains
two planes by solving two smaller quadratic programming prob-
lems (QPPs) than that of the standard SVM. Experimental results
show the effectiveness of TWSVM over SVM and GEPSVM (Arun
Kumar & Gopal, 2009; Jayadeva & Chandra, 2007). TWSVM takes
O(1/4m3) operations which is 1/4 of standard SVM, whereas,
GEPSVM takes O(1/4n3). Here, m is the number of training samples,
n is the dimensionality and m � n (Arun Kumar & Gopal, 2009; Jay-
adeva & Chandra, 2007). Obviously, GEPSVM is by far faster than
TWSVM. To reduce the time complexity and keep the effectiveness
of the twin SVM classifier, some scholars proposed its least squares
version (LSTSVM for short) in 2009 (Arun Kumar & Gopal, 2009;
Ghorai, Mukherjee, & Dutta, 2009). In fact, LSTSVM determines
two nonparallel planes by solving two PSVM-type (Fung & Manga-
sarian, 2001) problems. Compared with TWSVM, LSTSVM has les-
ser computational time due to the fact that it only solves two
systems of linear equations instead of two QPPs as for TWSVM.
TWSVM and LSTSVM however, also lose the genuine geometrical
interpretation of the nonparallel plane classifier. GEPSVM is
0957-4174/$ - see front matter � 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2011.01.131
⇑ Corresponding author.
E-mail address: yeqiaolin65620868@163.com (Q. Ye).
Expert Systems with Applications 38 (2011) 9425–9433
Contents lists available at ScienceDirect
Expert Systems with Applications
j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e s w a
http://dx.doi.org/10.1016/j.eswa.2011.01.131
mailto:yeqiaolin656208686@163.comyeqiaolin65620868@163.com
http://dx.doi.org/10.1016/j.eswa.2011.01.131
http://www.sciencedirect.com/science/journal/09574174
http://www.elsevier.com/locate/eswa
proposed to solve the complex examples which are difficult classi-
fication cases for typical linear classifiers just as XOR example does
(Mangasarian & Wild, 2006). Each of the planes obtained by
GEPSVM is as close as possible to the samples for its class and
meantime as far as possible from the samples for the other classes
(Mangasarian & Wild, 2006). However, TWSVM requires each of
planes obtained to be as close as possible to the samples for its
class and meantime at a distance of at least 1 from the samples
for the other classes (Jayadeva & Chandra, 2007). LSTSVM requires
each of the planes to be as close as possible to the samples for its
class and meantime at a distance of 1 from the samples for the
other classes. In intuition, when handling XOR examples of differ-
ent distribution, TWSVM and LSTSVM may yield poor classification
performance due to the difference from the optimization criterion
of GEPSVM, although they can obtain good classification perfor-
mance on UCI datasets due to the use of the loss function. More-
over, another flaw of TWSVM and LSTSVM is that two penalty
parameters are introduced to their objective functions instead of
one regularization parameter as for GEPSVM. Undoubtedly, this
will lead to the difficulty of parameter selection. In addition, when
there are many noise variables, the 1-norm SVM (Zou, 2007; Zhou,
Zhang, & Jiao, 2002) has advantages over the 2-norm SVM because
the former is capable of generating sparse solutions that make the
classifier easier to store and faster to compute. However, these
GEPSVM-based algorithms, including GEPSVM, cannot generate
very sparse solutions, even if we give their 1-norm formulations
as in 1-norm SVM (Zou, 2007). This is so because the direction wi
and threshold ri that determine the i
th separating planes combines
with the input samples.
In this paper, we first propose a new but fast algorithm, termed
as Difference GEPSVM (DGEPSVM). DGEPSVM need not consider
the singularity occurring in GEPSVM due to the use of a similar for-
mulation to the MMC (Jiang & Zhang, 2004). We show that the
solution of DGEPSVM reduces to solving two simple eigenvalue
problems. This property determines DGEPSVM is fast and at least
comparable to GEPSVM. Moreover, DGEPSVM can deal with XOR
examples with different distribution because it keeps the genuine
geometrical interpretation of GEPSVM. Then, we further propose a
feature selection algorithm for TWSVM, called FETSVM. This pro-
posed algorithm can overcome such a flaw, that is, GEPSVM and
other GEPSVM-based algorithms cannot generate the very sparse
solutions. Lastly, the two algorithms are compared on artificial
and UCI datasets. We also go onto illustrate their effectiveness
for USPS handwritten digits application.
Given four facts: (1) DGEPSVM need not care about the singu-
larity occurring in GEPSVM and performs better in classification
correctness than GEPSVM; (2) DGEPSVM surpasses TWSVM and
LSTSVM in terms of solving XOR examples with different distribu-
tion and gives comparable classification correctness on standard
datasets; (3) DGEPSVM has lesser unknown parameters than
TWSVM and LSTSVM; and (4) FETSVM performs faster than
TWSVM and suppresses input features as well as giving compara-
ble classification correctness.
2. Related work
2.1. Generalized Proximal Support Vector Machines (GEPSVM)
(Mangasarian & Wild, 2006)
Given m training points in n dimension input space Rn, denoted
by the m1 � n matrix A belonging to class 1 and the m2 � n matrix B
belonging to class-1, where m2 + m1 = m.
The main purpose of GEPSVM is to find two nonparallel hyper-
planes in n-dimension space, i.e.,
xT w1 � b1 ¼ 0; xT w2 � b2 ¼ 0 ð1Þ
where (wi, bi) 2 (Rn � R) (i = 1, 2). This algorithm requires each plane
to be as close as possible to the samples for its class and as far as
possible from the samples for the other classes at the same time.
Suppose (wi, bi) – 0, binary GEPSVM classifiers can be written as
the optimization formulation as follows:
ðGEPSVM1Þ Min
ðw1;b1Þ – 0
ðAw1 �e1 b1ÞTðAw1 �e1 b1Þþd
w1
b1
� �� �T w1
b1
� �� �
ðBw1 �e2 b1Þ
TðBw1 �e2 b1Þ
ð2Þ
ðGEPSVM2Þ Min
ðw2;b2Þ – 0
ðBw2 �e2 b2ÞTðBw2 �e2 b2Þþd
w2
b2
� �� �T w2
b2
� �� �
ðAw2 �e1 b2ÞTðAw2 �e1 b2Þ
ð3Þ
where d is a regularization constant. The formulation (2) enables
GEPSVM to obtain the plane, which is closest to the points for set
+1 and furthest from set �1, and (3) enables GEPSVM to obtain
the plane which is closest to the points for set �1 and furthest from
the points for set +1.
Let
G ¼ ET E þ dI; H ¼ FT F
L ¼ FT F þ dI; M ¼ ET E
z1 ¼
w1
b1
� �
; z2 ¼
w2
b2
� � ð4Þ
where E = [A � e1], F = [B � e2], E, F 2 Rn+1. The optimization problem
(2) and (3) become:
ðGEPSVM1Þ Min
z1 –0
zT1 G z1
zT1 H z1
ð5Þ
ðGEPSVM2Þ Min
z2 –0
zT2 L z2
zT2 M z2
ð6Þ
Using the well-known properties of Rayleigh quotient, we can
obtain the solutions of (5) and (6) by solving the following two
generalized eigenvalue problems:
Gz1 ¼ k1Hz1; Lz2 ¼ k2 Mz2; zi – 0; i ¼ 1; 2: ð7Þ
It can be seen from (2) and (3), Tikhonov regularization (Tikhonov &
Arsen, 1977) is applied to each GEPSVM pair. This can improve the
generalization of GEPSVM due to the fact that the regularization
term is used for penalty.
2.2. TWIN Support Vector Machines (TWSVM) (Jayadeva & Chandra,
2007)
In 2007, Jayadeva et al. proposed a stand-alone algorithm for
binary classification, termed as Twin SVM (TWSVM) (Jayadeva &
Chandra, 2007), which is in the spirit of GEPSVM. However,
TWSVM has different formulation from GEPSVM. This algorithm
obtains two nonparallel planes by solving two SVM-type problems.
Experimental results show TWSVM outperforms GEPSVM and
standard SVM, in terms of classification correctness. TWSVM can
be written as follows:
ðTWSVM1Þ Min
1
2ðAw1 � e1 b1Þ
TðAw1 þ e1b1Þþ C1eT2n
s:t: �ðBw1 � e2 b1Þþ n P e2; n P 0
ð8Þ
ðTWSVM2Þ Min
1
2ðBw2 � e2b2Þ
TðBw2 � e2b2Þþ C2eT1n
s:t: ðAw2 � e1 b2Þþ n P e1; n P 0
ð9Þ
where C1 and C2 are two penalty coefficients. From (8) and (9), we
find that only constraints of the other class appear. The objective
function does not sum up error over patterns of both the classes.
These features show TWSVM is effective on skewed or unbalanced
datasets. This may be a reason results in a better classification why
9426 Q. Ye et al. / Expert Systems with Applications 38 (2011) 9425–9433
https://isiarticles.com/article/25248