Double linear regressions for single labeled image per person face recognition


Double linear regressions for single labeled image per person
face recognition

Fei Yin a,n, L.C. Jiao a, Fanhua Shang b, Lin Xiong a, Shasha Mao a

a Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education of China, Xidian University, Mailbox 224,
No. 2 South TaiBai Road, Xi’an 710071, PR China
b Department of Electrical and Computer Engineering, Duke University, Durham, NC 27708, USA

a r t i c l e i n f o

Article history:
Received 12 March 2013
Received in revised form
10 September 2013
Accepted 19 September 2013
Available online 9 October 2013

Keywords:
Semi-supervised dimensionality reduction
Label propagation
Sparse representation
Linear regressions
Linear discriminant analysis
Face recognition

a b s t r a c t

Recently the underlying sparse representation structure in high dimensional data has received considerable
attention in pattern recognition and computer vision. In this paper, we propose a novel semi-supervised
dimensionality reduction (SDR) method, named Double Linear Regressions (DLR), to tackle the Single Labeled
Image per Person (SLIP) face recognition problem. DLR simultaneously seeks the best discriminating subspace
and preserves the sparse representation structure. Specifically, a Subspace Assumption based Label Propagation
(SALP) method, which is accomplished using Linear Regressions (LR), is first presented to propagate the label
information to the unlabeled data. Then, based on the propagated labeled dataset, a sparse representation
regularization term is constructed via Linear Regressions (LR). Finally, DLR takes into account both the
discriminating efficiency and the sparse representation structure by using the learned sparse representation
regularization term as a regularization term of Linear Discriminant Analysis (LDA). The extensive and
encouraging experimental results on three publicly available face databases (CMU PIE, Extended Yale B and
AR) demonstrate the effectiveness of the proposed method.

& 2013 Elsevier Ltd. All rights reserved.

1. Introduction

In many fields of scientific research such as face recognition [1],
bioinformatics [2], and information retrieval [3], the data are
usually presented in a very high dimensional form. This make
the researchers confront with the problem of “the curse of
dimensionality” [4], which limits the application of many practical
technologies due to the heavy computational cost in high dimen-
sional space, and deteriorates the performance of model estima-
tion when the number of training samples are small compared to
the number of features. In practice, dimensionality reduction has
been employed as an effective way to deal with “the curse of
dimensionality”. In the past years, a variety of dimensionality
reduction methods have been proposed [5–10].

According to the geometric structure considered, the existing
dimensionality reduction methods can be categorized into three types:
global structure based methods, local neighborhood structure based
methods, and the recently proposed sparse representation structure
[11,12] based methods. Two classical dimensionality reduction meth-
ods Principle Component Analysis (PCA) [13] and Linear Discriminant
Analysis (LDA) [14] belong to global structure based methods. In the
field of face recognition, they are known as “Eigenfaces” [15] and
“Fisherfaces” [16]. Two popular local neighborhood structure based

methods are Locality Preserving Projections (LPP) [17] and Neighbor-
hood Preserving Embedding (NPE) [18]. LPP and NPE are named
“Laplacianfaces” [19] and “NPEfaces” [18] in face recognition. The
representative sparse representation structure based methods include
Sparsity Preserving Projections (SPP) [20], Sparsity Preserving Discri-
minant Analysis (SPDA) [21] and Fast Fisher Sparsity Preserving
Projections (FFSPP) [22]. They have also been successfully applied to
face recognition. In order to deal with the nonlinear structure in data,
most of the above linear dimensionality reduction methods have been
extended to their kernelized versions which perform in Reproducing
Kernel Hilbert Space (RKHS) [23]. Kernel PCA (KPCA) [24] and Kernel
LDA (KLDA) [25] are the nonlinear dimensionality reduction methods
corresponding to PCA and LDA. Kernel LPP (KLPP) [17,26] and Kernel
NPE (KNPE) [27] are the kernelized versions of LPP and NPE. The
nonlinear version of SPDA is Kernel SPDA [21].

One of the major challenges to appearance-based face recogni-
tion is recognition from a single training image [28,29]. This
problem is called “one sample per person” problem: given a stored
database of faces, the goal is to identify a person from the database
later in time in any different and unpredictable poses, lighting, etc.
from just one image per person [28]. Under many practical
scenarios, such as law enforcement, driver license and passport
card identification, in which there is usually only one labeled
sample per person available, the classical appearance-based meth-
ods including Eigenfaces and Fisherfaces suffer big performance
drop or tend to fail to work. LDA fails to work since the within-
class scatter matrix degenerates to a zero matrix when only one

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/pr

Pattern Recognition

0031-3203/$ - see front matter & 2013 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.patcog.2013.09.013

n Corresponding author. Tel.: þ86 29 88202661; fax: þ86 29 88201023.
E-mail address: yinfei701@163.com (F. Yin).

Pattern Recognition 47 (2014) 1547–1558

www.sciencedirect.com/science/journal/00313203
www.elsevier.com/locate/pr
http://dx.doi.org/10.1016/j.patcog.2013.09.013
http://dx.doi.org/10.1016/j.patcog.2013.09.013
http://dx.doi.org/10.1016/j.patcog.2013.09.013
http://crossmark.crossref.org/dialog/?doi=10.1016/j.patcog.2013.09.013&domain=pdf
http://crossmark.crossref.org/dialog/?doi=10.1016/j.patcog.2013.09.013&domain=pdf
http://crossmark.crossref.org/dialog/?doi=10.1016/j.patcog.2013.09.013&domain=pdf
mailto:yinfei701@163.com
http://dx.doi.org/10.1016/j.patcog.2013.09.013


sample per person is available. Zhao et al. [30] suggested replacing
the within-class scatter matrix with an identity matrix to make
LDA work in this setting, although the performance of this
Remedied LDA (ReLDA) is still not satisfying. Due to its importance
and difficulty, one sample per person problem has aroused lots of
interest in face recognition community. To attack this problem,
many ad hoc techniques have been developed, including synthe-
sizing virtual samples [31,32], localizing the single training image
[33], probabilistic matching [34] and neural network methods
[35]. More details on single training image problem can be found
in a recent survey [28].

As the fast development of the digital photography industry, it
is possible to have a large set of unlabeled images. In this back-
ground, a more natural and promising way to attack one labeled
sample per person problem is semi-supervised dimensionality
reduction (SDR). Semi-supervised Discriminant Analysis (SDA)
[29] is one SDR method which has been successfully applied to
single labeled image per person face recognition. SDA first learns
the local neighborhood structure using the unlabeled data and
then uses the learned local neighborhood structure to regularize
LDA to obtain a discriminant function which is as smooth as
possible on the data manifold. Laplacian LDA (LapLDA) [36], Semi-
supervised LDA (SSLDA) [37], and Semi-supervised Maximum
Margin Criterion (SSMMC) [37] are all reported semi-supervised
dimensionality reduction methods which can improve the
performance of their supervised counterparts like LDA and Max-
imum Margin Criterion (MMC) [38]. These methods consider the
local neighborhood structure and can be unified under the graph
embedding framework [37,39]. Despite the success of these SDR
methods, there are still some disadvantages: (1) these SDR
methods are based on the manifold assumption which requires
sufficiently many samples to characterize the data manifold [40];
(2) the adjacency graphs constructed in these methods are
artificially defined, which brings the difficulty of parameter selec-
tion of neighborhood size and edge weights. To resolve these
issues, Sparsity Preserving Discriminant Analysis (SPDA) [21] was
presented. SPDA first learns the sparse representation structure
through solving n (number of training samples) ℓ1 norm optimi-
zation problems, and then uses the learned sparse representation
structure to regularize LDA. SPDA has achieved a good perfor-
mance on single labeled image per person face recognition, but it
still has some shortages: (1) it is computationally expensive since
n ℓ1 norm optimization problems need to be solved in learning the
sparse representation structure and (2) the label information is not
taken advantage of in learning the sparse representation structure.

To tackle the above problems, we propose a novel SDR method,
named Double Linear Regressions (DLR), which simultaneously seeks
the best discriminating subspace and preserves the sparse representa-
tion structure. More specifically, a Subspace Assumption Based Label
Propagation (SALP) method, which is accomplished using Linear
Regressions (LR), is first presented to propagate the label information
to the unlabeled data. Then, based on the propagated labeled dataset,
a sparse representation regularization term is constructed via Linear
Regressions (LR). Finally, DLR takes into account both the discriminat-
ing efficiency and the sparse representation structure by using the
learned sparse representation regularization term as a regularization
term of linear discriminant analysis. It is worthwhile to highlight some
aspects of DLR as follows:

(1) DLR is a novel semi-supervised dimensionality reduction method
aiming at simultaneously seeking the best discriminating sub-
space and preserving the sparse representation structure.

(2) DLR can obtain the sparse representation structure via n small
class specific linear regressions. Thus, it is more time efficient
than SPDA.

(3) In DLR, label information is first propagated to all the training
set. Then it is used in learning a more discriminative sparse
representation structure.

(4) Unlike SDA, there are no graph construction parameters in
DLR. The difficulty of selecting these parameters is avoided.

(5) Our proposed label propagation method SALP is quite general.
It can be combined with other graph-based SDR methods to
construct a more discriminative graph.

The rest of the paper is organized as follows. Section 2 gives a
brief review of LDA and RDA. DLR is proposed in Section 3. DLR is
compared with some related works in Section 4. The experimental
results and discussions are presented in Section 5. Finally, Section
6 gives some concluding remarks and future work.

2. A brief review of LDA and RDA

Before we go into the details of our proposed DLR algorithm, a
brief review of LDA and RDA is given in the following.

2.1. LDA

Given a set of samples fxigni ¼ 1, where xi Aℝm, let X ¼
½x1; x2; …; xn�Aℝm�n be the data matrix consisting of all samples.
Suppose samples are from K classes. LDA aims at simultaneously
maximizing the between-class scatter and minimizing the within-
class scatter. The objective function of LDA is defined as follows:

max
w

wTSBw
wTSWw

; ð1Þ

SB ¼ ∑
K

k ¼ 1
Nkðm�mkÞðm�mkÞT; ð2Þ

SW ¼ ∑
K

k ¼ 1
∑

iACk
ðxi �mkÞðxi �mkÞT; ð3Þ

where mk ¼ 1=Nk∑iACk xi, m ¼ 1=n∑ni ¼ 1xi, Ck is the index set of
samples from class k, and Nk is the number of samples in class k. SB
is called between-class scatter matrix and SW is called within-class
scatter matrix. The optimal w can be computed as the eigenvector

of SW
�1SB that corresponds to the largest eigenvalue [14].

When there is only one labeled sample per class, LDA fails to
work because SW is a zero matrix. The Remedied LDA (ReLDA) for
one labeled sample per class scenario was proposed by Zhao et al.
[30] in which SW is replaced by an identity matrix.

2.2. RDA

Despite its simplicity and effectiveness for classification, LDA
suffers from the Small Sample Size (SSS) problem [41]. Among the
methods designed to attack this problem, Regularized Discrimi-
nant Analysis (RDA) [42,43] is a simple and effective one, whose
objective function is defined as follows:

max
w

wTSBw
wTSWwþλ1wTw

ð4Þ

where λ1 is the tradeoff parameter. The optimal w can be computed
as the eigenvector of ðSW þλ1IÞ�1SB that corresponds to the largest
eigenvalue.

F. Yin et al. / Pattern Recognition 47 (2014) 1547–15581548


https://isiarticles.com/article/24686