Sliding window-based support vector regression for predicting micrometeorological data


Expert Systems With Applications 59 (2016) 217–225 

Contents lists available at ScienceDirect 

Expert Systems With Applications 

journal homepage: www.elsevier.com/locate/eswa 

Sliding window-based support vector regression for predicting 

micrometeorological data 

Yukimasa Kaneda a , ∗, Hiroshi Mineno b , c 

a Graduate School of Integrated Science and Technology, Shizuoka University, 3-5-1 Johoku, Naka-ku, Hamamatsu, Shizuoka 432-8011, Japan 
b College of Informatics, Academic Institute, Shizuoka University, 3-5-1 Johoku, Naka-ku, Hamamatsu, Shizuoka 432-8011, Japan 
c JST, PRESTO, 4-1-8 Honcho, Kawaguchi, Saitama, 332-0012, Japan 

a r t i c l e i n f o 

Article history: 

Received 4 February 2016 

Revised 29 March 2016 

Accepted 13 April 2016 

Available online 23 April 2016 

Keywords: 

Predicting micrometeorological data 

Data extraction 

Dynamic aggregation 

Support vector regression 

Ensemble learning 

a b s t r a c t 

Sensor network technology is becoming more widespread and sophisticated, and devices with many sen- 

sors, such as smartphones and sensor nodes, have been used extensively. Since these devices have more 

easily accumulated various kinds of micrometeorological data, such as temperature, humidity, and wind 

speed, an enormous amount of micrometeorological data has been accumulated. In recent years, it has 

been expected that such an enormous amount of data, called big data, will produce novel knowledge 

and value. Accordingly, many current applications have used data mining technology or machine learn- 

ing to exploit big data. However, micrometeorological data has a complicated correlation among different 

features, and its characteristics change variously with time. Therefore, it is difficult to predict microme- 

teorological data accurately with low computational complexity even if state-of-the-art machine learning 

algorithms are used. In this paper, we propose a new methodology for predicting micrometeorological 

data, sliding window-based support vector regression (SW-SVR) that involves a novel combination of sup- 

port vector regression (SVR) and ensemble learning. To represent complicated micrometeorological data 

easily, SW-SVR builds several SVRs specialized for each representative data group in various natural envi- 

ronments, such as different seasons and climates, and changes weights to aggregate the SVRs dynamically 

depending on the characteristics of test data. In our experiment, we predicted the temperature after 1 h 

and 6 h by using large-scale micrometeorological data in Tokyo. As a result, regardless of testing periods, 

training periods, and prediction horizons, the prediction performance of SW-SVR was always greater than 

or equal to other general methods such as SVR, random forest, and gradient boosting. At the same time, 

SW-SVR reduced the building time remarkably compared with those of complicated models that have 

high prediction performance. 

© 2016 The Authors. Published by Elsevier Ltd. 

This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ ). 

1

 
s  

t  

c  

s  

o  

c  

e  

a  

t  

a  

m  

a  

o

 
s  

M  

f  

a  

m  

e  

a  

i  

l  

h

0

. Introduction 

Sensor network technology is becoming more widespread and

ophisticated, and devices with many sensors have been used ex-

ensively. The devices can very easily obtain various kinds of mi-

rometeorological data such as temperature, humidity, and wind

peed. Micrometeorological data is affected strongly by the surface

f the earth and is related to our lives and industrial activity. Ac-

ordingly, the data has been used by many applications such as

nvironmental control systems for greenhouses ( Othman & Shaz-

li, 2012; Park & Park, 2011 ). Moreover, more advanced applica-

ions exploit the data to a greater extent by using machine learning

nd data mining technology. Furthermore, an enormous amount of
∗ Corresponding author. 
E-mail address: kaneda@minelab.jp (Y. Kaneda). 

e  

c  

i  

ttp://dx.doi.org/10.1016/j.eswa.2016.04.012 

957-4174/© 2016 The Authors. Published by Elsevier Ltd. This is an open access article u
icrometeorological data has been accumulated by many devices,

nd it has been expected that analyzing such an enormous amount

f data, called big data, will produce novel knowledge and value. 

To predict micrometeorological data effectively, a number of re-

earchers have studied machine learning ( Smith, Hoogenboom, &

cClendon, 2009 ). These researchers described prediction methods

or micrometeorological data; particularly, prediction performance

nd computational complexity were often mentioned. Meanwhile,

icrometeorological data has a complex correlation among differ-

nt features such as temperature and humidity. Moreover, its char-

cteristics change variously with time. Therefore, even if big data

s given as training data, it is not easy to predict micrometeoro-

ogical data accurately. Furthermore, in many cases, so that mod-

ls can have high prediction performance, they have to become

omplicated, and the computational complexity increases. Accord-

ngly, some models probably cannot be built from big data in a
nder the CC BY license ( http://creativecommons.org/licenses/by/4.0/ ). 

http://dx.doi.org/10.1016/j.eswa.2016.04.012
http://www.ScienceDirect.com
http://www.elsevier.com/locate/eswa
http://crossmark.crossref.org/dialog/?doi=10.1016/j.eswa.2016.04.012&domain=pdf
http://creativecommons.org/licenses/by/4.0/
mailto:kaneda@minelab.jp
http://dx.doi.org/10.1016/j.eswa.2016.04.012
http://creativecommons.org/licenses/by/4.0/


218 Y. Kaneda, H. Mineno / Expert Systems With Applications 59 (2016) 217–225 

 
2

 
l  

a  

m  

c  

g  

n  

e  

s  

t  

o  

e  

t

2

 
i  

o  

g  

l  

a  

r  

k  

l  

t  

2  

v  

h  

m

 
v  

c  

p  

a

m

s

 
s  

v  

t  

v

m

s

w  

m  

m  

A  

a

practical amount of computing time. In other words, there is a

trade-off relationship between high prediction performance and

low computational complexity. However, compatibility is required

in some practical use. As the prediction performance in applica-

tions becomes higher, the quality provided by the applications be-

comes better. For example, in the case of environmental control

systems based on prediction ( Kolokotsa, Pouliezos, Stavrakakis, &

Lazos, 2009 ), the higher prediction performance enables the sys-

tems to provide precise control, precise management, and better

environments. On the other hand, models that need a long time for

training are worthless in practical use. In current situations where

the amount of usable data has increased remarkably, this trade-off

relationship has become a more critical issue. 

Recently, one type of machine learning algorithm, support vec-

tor machines (SVMs), have been used successfully in various fields.

The basic theory is a more efficient learning method based on

probably approximately correct (PAC) learning. Moreover, SVMs

can separate non-linear data with low computational complex-

ity. Since most data observed in the real world is likely to have

non-linear relationships, SVMs have also been applied to mi-

crometeorological data prediction ( Antonanzas, Urraca, Martinez-

de-Pison, & Antonanzas-Torres, 2015; Mohammadi, Shamshirband,

Anisi, Alam, & Petkovi ́c, 2015; Urraca, Antonanzas, Martinez-de-

Pison, & Antonanzas-Torres, 2015 ). Moreover, SVMs led to better

prediction performance than other algorithms such as artificial

neural networks (ANNs) and the autoregressive integrated mov-

ing average (ARIMA) model ( Chevalier, Hoogenboom, McClendon,

& Paz, 2011; Maity, Bhagwat, & Bhatnagar, 2010 ). However, when

SVMs learn big data, the computational complexity is still a matter

of concern. Another alternative learning method, ensemble learn-

ing, has also been used more widely for predicting micrometeo-

rological data ( Singh, Gupta, & Rai, 2013 ). The prediction perfor-

mance of ensemble learning is greater than or equal to that of

SVMs. The basic methodology is a combination of weak learners

built from different kinds of training data. The combination yields

a higher generalizing capability that a single model cannot rep-

resent. In particular, some researchers proposed improved meth-

ods that could be applied to micrometeorological data prediction

( Wang & Japkowicz, 2009; Xie, Li, Ngai, & Ying, 2009 ). However,

it is difficult to apply the methods to regression, and it is possi-

ble that the models will not be able to follow micrometeorological

data whose characteristics always change with time. 

In this paper, we propose a new methodology for predicting

micrometeorological data, sliding window-based support vector re-

gression (SW-SVR). SW-SVR involves a novel combination of sup-

port vector regression (SVR) and ensemble learning. To represent

complicated micrometeorological data easily, SW-SVR builds sev-

eral SVRs specialized for each representative data group in vari-

ous natural environments, such as different seasons and climates.

The specialized SVRs are built based on our previous proposed

method, dynamic short-distance data collection (D-SDC) that ex-

tracts effective data for specific data prediction by taking account

of movements: changes in data during prediction horizons. Each

weak learner built from each extracted data specializes on spe-

cific data and predicts accurately the data similar to the special-

ized data. Then, SW-SVR aggregates all the predicted values based

on weights decided by the similarity between test data and each

data specialized by weak learners. This new ensemble learning

methodology that changes weights dynamically enables following

micrometeorological data whose characteristics hardly change with

time. Our results demonstrated that the prediction performance of

SW-SVR was always greater than or equal to that of other general

methods such as SVR, random forest, and gradient boosting. At the

same time, SW-SVR reduced the building time remarkably com-

pared with that of complicated models that have high prediction

performance. 
. Related work 

As mentioned in the introduction, to predict micrometeoro-

ogical data effectively, SVMs and ensemble learning have gener-

lly been used. These algorithms have higher prediction perfor-

ance for micrometeorological data than traditional methods be-

ause SVMs use not only a margin maximizing algorithm whose

reat performance was proved by PAC learning but also the ker-

el trick that enables non-linear separation. On the other hand,

nsemble learning provides higher generalizing capability that a

ingle model cannot represent. In this section, a brief summary of

hese algorithms and some improved algorithms are given. More-

ver, so that SW-SVR can draw advantages from both SVMs and

nsemble learning, several problems of these algorithms for prac-

ical use are discussed. 

.1. Support vector regression 

SVMs, introduced by Vapnik,(1995 ), have been used successfully

n various fields. In the simplest case, binary classification, SVMs

btain a separating hyperplane decided by maximizing the mar-

in. The margin means the norms between different classes. PAC

earning proved that maximizing the margin produces high gener-

lization ability. Moreover, the kernel trick enables SVMs to sepa-

ate data non-linearly with low computational complexity. Various

inds of data observed in the real world are likely to have non-

inear relationships. Accordingly, SVMs are used in many applica-

ions such as micrometeorological data prediction ( Kisi & Cimen,

012; Maity et al., 2010 ). Meanwhile, SVMs for regression, support

ector regression (SVR), uses the same methodology as SVMs that

ave the highest generalization ability. In this section, a brief sum-

ary of SVR is given as follows. 

First, the linear function for regression is given as follows: 

f ( x ) = w T x + b. 
Then, as with SVMs, SVR also minimizes the norm of the weight

ector w ; the L 2 norm ‖ w ‖ 2 is often used, and minimizing ‖ w ‖ 2
orresponds to maximizing the margin. Meanwhile, SVR tolerates

rediction error �. Therefore, the primal problem of SVR is shown
s follows: 

inimize 
‖ w ‖ 2 

2 

ubject to 

{
y i −

(
w T x i + b 

)
≤ �(

w T x i + b 
)

− y i ≤ �. 
Moreover, to take some errors into account further, the same

lack variables ξ as soft margin SVMs are introduced. The slack
ariables mean penalties and increase in proportion to errors be-

ween true values and predicted values. The problem that the slack

ariables are introduced into is shown as follows: 

inimize 
‖ w ‖ 2 

2 
+ C 

∑ 
i 

(
ξi + ξ ∗i 

)

ubject to 

⎧ ⎨ 
⎩ 

y i −
(
w T x i + b 

)
≤ � + ξi (

w T x i + b 
)

− y i ≤ � + ξ ∗i 
ξi , ξ

∗
i 

≥ 0 . 
here the constant C means the balance between the effect of

aximizing the margin and penalties. To minimize the above for-

ula, the slack variables in the formula must also be minimized.

ccordingly, the slack variables depending on the errors are shown

s follows: 

ξi = 
{

0 
(
y i −

(
w T x i + b 

)
≤ �

)
y i −

(
w T x i + b 

)
− � otherwise 


Y. Kaneda, H. Mineno / Expert Systems With Applications 59 (2016) 217–225 219 

ξ

 
t  

t  

w  

t  

r  

c  

A  

s  

m  

m

m

s

 
l  

l  

ϕ  
d  

h  

o  

t

a  

c  

w  

p  

(  

i  

p  

d  

c  

t  

p  

&  

b  

s  

t  

d  

r  

n  

p

 
v  

p  

f  

l  

p  

u  

c

2

 
i  

t  

T  

g  

Algorithm 1 

Bagging for regression. 

Input: 

Training data: D = { ( x 1 , y 1 ) , . . . , ( x N , y N ) } where x i ∈ X , y i ∈ Y 
Number of weak learners: n 

For t = 1 to n do 
1. D t ← generate sample from D with replacement 
2. H t ( X ) ← build a weak learner from D t 

Output: 

H(X ) = 1 
n 

n ∑ 
t=1 

H t (X ) 

Algorithm 2 

Boosting for regression. 

Input: 

Training data: D = { ( x 1 , y 1 ) , . . . , ( x N , y N ) } where x i ∈ X , y i ∈ Y 
Number of weak learners: n 

Weights: w i = 1/ N 
For t = 1 to n do 

1. H t ( X ) ← build a weak learner from D by using weights w t 
2. �t ← compute error rate of H t ( X ) 
3. αt ← compute reliability of prediction result of H t ( X ) based on �t 
4. w t+1 ← update weights w t based on αt 

Output: 

H(X ) = 
n ∑ 

t=1 
( αt H t (X ) ) / 

n ∑ 
t=1 

αt 

r  

i  

p  

b  

w

 
p  

A  

a  

T  

n  

m  

(  

a  

c  

i  

s  

i  

e  

e  

l  

s  

d  

i

 
b  

g  

A  

t  

e  

i  

i  

i  

(  

f  

b  

b  

t  

a  

p

∗
i = 

{
0 

((
w T x i + b 

)
− y i ≤ �

)
(
w T x i + b 

)
− y i − � otherwise . 

The above formulas mean that a penalty is not given when

he error is lower than �, but the error is regarded as a penalty
hat cannot be tolerated when the error is higher than �. In other
ords, SVR tolerates errors less than �, but errors over � are solely

aken into account as penalties. Finally, the dual problem is de-

ived from the above primal problem by Lagrange multiplier and

orresponds to a quadratic programming problem as with SVMs.

s a result, since a unique global optimal solution is solved, SVR is

uperior to traditional algorithms that might fall into a local opti-

al solution, such as ANNs. The dual problem derived by Lagrange

ultiplier is shown as follows: 

aximize − 1 
2 

∑ 
i, j 

( αi + α∗i ) 
(
α j + α∗j 

)
x T i x j − �

∑ 
i 

( αi + α∗i ) 

+ 
∑ 

i 

y i ( αi − α∗i ) 

ubject to 

{∑ 
i 

( αi − α∗i ) = 0 
αi , α

∗
i 

∈ [ 0 , C ] . 
Moreover, the above dual problem can easily involve non-

inear map ϕ to consider a higher dimension. To introduce non-
inear map ϕ in the above problem, kernel function K(x i , x j ) =
 

t (x i ) ϕ(x j ) is defined and used instead of x 
T 
i 
x j . Then ϕ 

t (x i ) ϕ (x j ) is
etermined based on K( x i , x j ) without calculation on a mapped

igher dimension; this method is called the kernel trick. SVR based

n maximizing the margin and the kernel trick yields high predic-

ion performance. 

Meanwhile, conventional quadratic programming solvers, such 

s the steepest descent method, have very high computational

omplexity; the computational complexity is approximately O ( N 3 )

here N is the number of training data. Accordingly, a quadratic

rogramming solver for SVMs, sequential minimal optimization

SMO), has become de facto standard ( Platt, 1998 ). SMO special-

zed for SVM reduce the computational complexity of SVM to ap-

roximately O ( N 2 ). Nevertheless, when an enormous amount of

ata is inputted as training data, the computational complexity in-

reases substantially. To solve the problem, a theory that regards

he quadratic programming problem as a computational geometry

roblem, core vector machine (CVM), was proposed ( Tsang, Kwok,

 Cheung, 2005 ). The prediction performance of CVM is compara-

le to that of SVMs, and the computational complexity decreases

ubstantially. However, according to a paper ( Loosli, 2007 ), predic-

ion performance and computational complexity of CVM strongly

epend on the values of parameters. Therefore, when essential pa-

ameter tuning for practical use is taken into account, CVM does

ot always satisfy both high prediction performance and low com-

utational complexity. 

SVR is one of the best algorithms in machine learning from the

iewpoint of prediction performance. In particular, it has been ex-

ected that the kernel trick used in the dual problem is effective

or predicting micrometeorological data that has a complex corre-

ation among different f eatures. However, the com putational com-

lexity to solve the dual problem is often still long for practical

se. Thus, it is difficult to apply conventional SVR directly to mi-

rometeorological data prediction. 

.2. Ensemble learning 

Ensemble learning has been studied recently and used increas-

ngly. The basic methodology of ensemble learning is a combina-

ion of weak learners built from different kinds of training data.

he combination yields a higher generalizing capability that a sin-

le model cannot represent. As with SVMs, ensemble learning can
epresent non-linear relationships and has been used for predict-

ng micrometeorological data. In particular, the two kinds of ap-

roaches, bagging and boosting, have often been used in ensem-

le learning. The approaches differ greatly on the method to build

eak learners and aggregate them. 

Bagging uses several training data generated by bootstrap sam-

ling. The algorithm of basic bagging for regression is shown in

lgorithm. 1 . In bagging, different kinds of training data are cre-

ted by sampling inputted original training data with replacement.

hen, weak learners are built from each sampled training data. Fi-

ally, each predicted value is aggregated by majority vote or arith-

etic average. In particular, random forest, introduced by Breiman

 Breiman, 2001 ), to which randomness in feature selection is also

pplied, often demonstrates better prediction performance than

onventional models such as SVMs. Random forest is used in var-

ous applications and has been extended to other improved ver-

ions. For example, to predict imbalanced data observed frequently

n the real world more accurately, improved balanced random for-

st (IBRF) has been proposed ( Xie et al., 2009 ). IBRF involves an

fficient sampling method for imbalanced data and cost-sensitive

earning that penalizes misclassification of minority class more

trongly. The authors showed that IBRF was more effective to pre-

ict imbalanced data than class-weighted SVMs and conventional

mproved random forest for imbalanced data prediction. 

Boosting builds repeatedly weak learners by using weights

ased on the error rate. The algorithm of basic boosting for re-

ression such as Adaboost ( Freund & Schapire, 1997 ) is shown in

lgorithm. 2 . Unlike bagging, almost all boosting algorithms use

he same training data, but the training data is weighted repeat-

dly. Boosting alternates between building weak learners by us-

ng weights and updating weights. Finally, each predicted value

s aggregated by weighted average. Various kinds of algorithms

n boosting have been studied and proposed; gradient boosting

 Friedman, 2001 ) in particular has shown the best prediction per-

ormance in many competitions. Meanwhile, as with IBRF, the

oosting algorithm for imbalanced data, boosting-SVM, has also

een proposed ( Wang & Japkowicz, 2009 ). The main characteris-

ic of boosting-SVM is using asymmetric misclassification cost. The

uthors demonstrated that boosting-SVM enabled more accurate

rediction of both the majority class and minority class. 


220 Y. Kaneda, H. Mineno / Expert Systems With Applications 59 (2016) 217–225 

Training data

Extracted data

Center of cluster

Test data

Number of weak learners: 3

Threshold of extraction

Training data
Specialized object

Movement of data

Training data 
at end of 

prediction horizon

(a) Extraction of training data by D-SDC. (b) Weighted ensemble learning in SW-SVR.

Fig. 1. Processing overview of SW-SVR. 

 
Algorithm 3 

Sliding window-based support vector regression. 

Input: 

Training data set: S = { ( x 1 , y 1 , x ′ 1 ) , . . . , ( x N , y N , x ′ N ) } where 
x i ∈ X, y i ∈ Y, x ′ i ∈ X ′ 

Test data: P 

Number of weak learners: n 

Weight parameters: p , q 

Preprocessing: 

1. apply normalization to X and X ′ 
2. fit kernel approximation and PLS regression to X and X ′ 
3. M i = || x i − x ′ i || , i = 1 . . . N
4. G t ← each center of kmenas (X ) , t = 1 . . . n 

For t = 1 to n do 
1. D ti = || G t − x i || , i = 1 . . . N
2. r t = 

N ∑ 
i =1 

( w i M i ) / 
N ∑ 

i =1 
( w i ) where w i = 1 /D p ti 

3. S t = { ( x i , y i ) | D ti < r t } , i = 1 . . . N
4. H t ( X ) ← train LinearSVR ( S t ) 

Output: 

H ( P ) = 
n ∑ 

t=1 
( w t H t (P) ) / 

n ∑ 
t=1 

( w t ) where w t = 1 / || G t − P|| q 

l  

2  

a  

r  

t  

a  

p  

t  

t  

i  

d  

r  

p  

f  

c

 
t  

e  

e  

a  

i  

o  

f  

m  

w  

m  
When micrometeorological data including many unusual natu-

ral environments is regarded as imbalanced data, the above meth-

ods are likely to classify micrometeorological data more accurately.

However, these approaches cannot be applied to regression. More-

over, according to our previous research ( Suzuki, Kaneda, & Mi-

neno, 2015 ), there is proper training data depending on test data.

In other words, weights to aggregate weak learners built from dif-

ferent kinds of training data should depend on test data. 

3. SW-SVR: Sliding window-based support vector regression 

We propose a new methodology for predicting micrometeoro-

logical data, sliding window-based support vector regression, com-

bining methodologies of SVR and ensemble learning. The basic the-

ories are based on D-SDC, our previous proposed method to extract

effective data for specific data prediction, and novel weighted en-

semble learning as shown in Fig. 1 . First, to represent complicated

micrometeorological data easily, SW-SVR builds several SVRs spe-

cialized for each representative data group in various natural envi-

ronments, such as different seasons and climates. The specialized

SVRs are built based on D-SDC that extracts effective data for spe-

cific data prediction by taking account of movements: changes of

data during prediction horizons ( Fig. 1 (a)). Each weak learner built

from each extracted data specializes on specific data and accurately

predicts the data similar to the specialized data. Afterward, each

weak learner is aggregated with weights determined dynamically

at the time of prediction so as to maintain the prediction perfor-

mance of micrometeorological data whose characteristics always

change with time ( Fig. 1 (b)). The weights are decided by the simi-

larity between test data and each data specialized by weak learn-

ers. Even if the characteristics of micrometeorological data always

change with time, SW-SVR always gives priority to weak learners

that are more suitable for predicting test data. The details of the

SW-SVR algorithm are shown in Algorithm. 3 . The procedure for

training consists of two kinds of preprocessing, iterated learning,

and dynamic aggregation. The procedures of each part are shown

as follows. 

The below-mentioned algorithms in SW-SVR use the L 2 norm:

the Euclid distance, and the performance is related to feature

space. For example, if feature space includes noisy features or

non-linear relationships between features, the performance will

probably be reduced substantially. In particular, micrometeoro-

logical data has a complex correlation among different features

such as temperature and humidity. Accordingly, feature space must

be mapped into other feature space that takes into account the

presence of noise and non-linear relationships. In our approach,

we use kernel approximation ( Rahimi & Recht, 2007 ) and partial
east squares (PLS) regression ( Tenenhaus, Vinzi, Chatelin, & Lauro,

005 ) to map into new feature space. Kernel approximation gener-

tes new feature space and involves higher dimensions that rep-

esent non-linear data as linear data with a very low computa-

ional complexity. Actually, a combination of kernel approximation

nd linear SVMs led to faster prediction performance that is com-

arable to that of exact SVM ( Cao, Naito, & Ninomiya, 2008 ). On

he other hand, PLS regression is a supervised dimension reduc-

ion methodology. This method can reduce dimensions by extract-

ng latent variables that have a strong relationship with a depen-

ent variable. If feature space includes noisy features, the effect is

educed because of PLS regression. The combination of kernel ap-

roximation and PLS regression enables SW-SVR to use effective

eature space for calculation of the L 2 norm in micrometeorologi-

al data. 

According to our previous research, to accurately predict par-

icular specific data in micrometeorological data, it is necessary to

xtract effective training data for specific data prediction ( Suzuki

t al., 2015 ). In SW-SVR, these several specific data is selected in

dvance, and weak learners are built from extracted effective train-

ng data for predicting each specific data. Meanwhile, micromete-

rological data involves various natural environments such as dif-

erent seasons and climates. Therefore, each selected specific data

ust represent more varied natural environments that probably

ill appear so as to represent micrometeorological data by several

odels. In SW-SVR, each specific data is selected by a clustering


Y. Kaneda, H. Mineno / Expert Systems With Applications 59 (2016) 217–225 221 

a  

m  

d  

I  

n  

T  

s

 
t  

t  

K  

t  

s  

D  

a  

o  

f

w  

p  

a  

e

w  

v  

c  

d  

m  

i  

m  

d  

o  

d  

l  

a  

d  

a  

t  

s  

m  

t  

a  

r  

o  

n  

m  

w  

g

 
t  

b  

l  

a  

b  

t  

i  

t  

l  

p  

i  

m  

d

 
o  

p  

t  

S  

w  

d  

s

H

 
a  

s  

c  

d

 
T  

u  

p  

S  

i  

i  

N  

r  

i  

t  

w  

c  

t  

O

4

4

 
m  

g  

S  

p  

k  

b  

b  

t  

s  

p  

m  

p  

m  

n

y

w  

p

 
t  

a  

i  

a  

a  

t  

c  

i  
lgorithm, k-means ( Macqueen, 1967 ). The k-means is one of the

ost famous non-hierarchical clustering algorithms and classifies

ata faster under several clusters than other clustering algorithms.

n SW-SVR, the k-means classifies all training data into the same

umber of clusters as the number of weak learners given by users.

hen, each center of clusters is used as specific data that repre-

ents various natural environments. 

After selecting several specific data, SW-SVR iterates data ex-

raction and building a model. First, SW-SVR extracts effective

raining data for predicting each specific data by D-SDC ( Suzuki,

aneda, & Mineno, 2014 ). The theory of D-SDC is similar to that of

he k-nearest neighbor (k-NN) algorithm, and D-SDC also extracts

ome training data similar to a specialized object. However, in our

-SDC, the amount of extracted data depends on the movement of

 specialized object with time. The movement r means the change

f a specialized object during prediction horizons as shown in the

ollowing equation: 

r t = ‖G t − G ′ t ‖ 
here G is a specialized object, and G ′ is a specialized object after

rediction horizons. D-SDC extracts training data whose norm from

 specialized object is shorter than the movement r . Accordingly,

xtracted training data S by D-SDC is given as follows: 

S t = 
{
( x i , y i ) | ‖G t − x i ‖ < ‖G t − G ′ t ‖ 

}
here x is the feature of training data and y is the dependent

ariable of training data. D-SDC is based on the movement r be-

ause the movement r is strongly related to autocorrelation of

ata surrounding a specialized object. In micrometeorological data,

ovements in specific natural environments are mutually sim-

lar, and the autocorrelation becomes lower when these move-

ents are bigger. For example, in Japan, the change of weather is

rastic every spring, and the natural environments change various

ther natural environments with time. Meanwhile, when we pre-

ict time series data such as micrometeorological data, autocorre-

ation means correlation between features and a dependent vari-

ble, and more training data is required for highly accurate pre-

iction when autocorrelation is lower. Since D-SDC extracts the

mount of data surrounding a specialized object in proportion to

he movement r , extraction that considers autocorrelation of data

urrounding a specialized object is achieved. However, the move-

ent r is unknown because G ′ is not observed. Meanwhile, as men-
ioned above, movements of data surrounding a specialized object

re mutually similar. Therefore, D-SDC estimates the movement

 based on movements of training data similar to a specialized

bject by weighted average, where the weights are reciprocals of

orms between a specialized object and each training data. Move-

ents of training data can be calculated by referring to the time

hen each training data is observed. The estimated movement r is

iven as follows: 

r t = ‖G t − G ′ t ‖≈
∑ N 

i =1 w i ‖x i − x ′ i ‖ ∑ N 
i =1 w i 

where w i = 
1 

‖G t − x i ‖ p , 

N is the number of training data, and p is a weighted parame-

er. Afterward, SW-SVR builds several linear SVRs as weak learners

ased on the extracted data. As described above, a combination of

inear SVR and kernel approximation is comparable to SVR using

 kernel method. Moreover, linear SVR can be built much faster

y using liblinear ( Fan, Chang, Hsieh, Wang, & Lin, 2008 ), an op-

imized implementation for linear SVMs, instead of other general

mplementations of SVMs such as libSVM ( Chang & Lin, 2011 ). Al-

hough a usable kernel in liblinear is restricted to the linear kernel,

iblinear can build the model much faster by solving the primal

roblem instead of the dual problem. Furthermore, since all train-

ng data is divided into smaller amounts of extracted data, each
odel can be built faster, and it is easier to learn each extracted

ata by parallel processing. 

The predicted values of SW-SVR take into account the change

f natural environments with time. In general ensemble learning,

rediction for regression depends on the weighted average, and

he weights are determined at the time of training. However, SW-

VR determines weights dynamically at the time of prediction. The

eights are determined by the norm between test data and each

ata specialized by weak learners. A final hypothesis of SW-SVR is

hown as follows: 

 ( P ) = 
∑ n 

t=1 w t H t ( P ) ∑ n 
t=1 w t 

where w t = 
1 

‖ G t − P ‖ q , 
P is the test data, n is the number of weak learners, H ( X ) is

 hypothesis, and q is a weighted parameter. In our approach,

ince the weights of ensemble learning are determined dynami-

ally for every prediction, SW-SVR can follow micrometeorological

ata whose characteristics always change with time. 

Finally, we describe the computational complexity of SW-SVR.

o represent complicated micrometeorological data easily, SW-SVR

ses the various conventional methods besides D-SDC we pro-

osed: kernel approximation, PLS regression, k-means, and linear

VR. The computational complexity of these methods in general

ncreases linearly; in other words, the computational complexity

s approximately equal to O ( N ) where the number of training data

 is even bigger than the number of the dimensions and each pa-

ameter of these methods. Moreover, the computational complex-

ty of D-SDC corresponds to O ( nN ) because D-SDC just iterates N

imes of distance calculation n + 1 times where n is the number of
eak learners in SW-SVR. Therefore, if N is even bigger than n , the

omputational complexity of D-SDC also increases linearly. The to-

al computational complexity of SW-SVR is approximately equal to

 ( N ) that is even less than that of SVR. 

. Evaluation 

.1. Experiment 

We compared the performance of SW-SVR with other standard

ethods for regression: k-NN, decision tree (DT), Adaboost, bag-

ing, random forest (RF), gradient boosting (GB), linear SVR, and

VR using a radial basis function (RBF) kernel that shows higher

erformance in various fields (RBF-SVR). Note that the kernel of

ernel approximation in SW-SVR is also the RBF kernel, and the

ase learner in Adaboost and bagging is the decision tree that has

een used generally. Moreover, to evaluate SW-SVR in more de-

ail, we evaluated the performance of linear SVR with mapping:

tandard linear SVR to which the same mapping as SW-SVR is ap-

lied (“mapped SVR”). Mapped SVR clarifies each performance of

apping feature space and ensemble learning based on D-SDC. All

arameters of the used models were adjusted by the grid search

ethod. Baseline for this evaluation was the performance of the

aivest persistent model as shown in the following formula: 

ˆ 
 i + �t = y i 
here ˆ y is the predicted value, y is the true value, and �t is the

rediction horizons. 

We evaluated the performance by two ways: hold-out valida-

ion and 10-fold cross-validation. We predicted the temperature

fter 1 h and 6 h by using large-scale micrometeorological data

n Tokyo ( Japan Meteorological Agency, n.d. ). The data consists of

tmospheric pressure, temperature, relative humidity, wind speed,

nd irradiance. In hold-out validation, training periods are limited

o the earlier periods than testing periods so as to assume practi-

al use; test data is always predicted based on past training data

n practical use. The training periods were from 3 months to 5


222 Y. Kaneda, H. Mineno / Expert Systems With Applications 59 (2016) 217–225 

(a) Testing periods: 1 month. (b) Testing periods: 6 months. (c) Testing periods: 12 months.

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

5.5

6.0

6.5

3 6 12 24 36 60

M
A

P
E

 [
%

]
(l
o

g
 s

ca
le

)

Training periods [months]

Persistent

k-NN

DT

Adaboost

Bagging

RF

GB

Linear SVR

mapped SVR

RBF-SVR

SWSVR

8.0E+00

1.6E+01

3.2E+01

6.4E+01

1.3E+02

2.6E+02

5.1E+02

3 6 12 24 36 60

M
A

P
E

 [
%

]
(l
o

g
 s

ca
le

)

Training periods [months]

Persistent

k-NN

DT

Adaboost

Bagging

RF

GB

Linear SVR

mapped SVR

RBF-SVR

SWSVR

5.0E+00

1.0E+01

2.0E+01

4.0E+01

8.0E+01

1.6E+02

3.2E+02

3 6 12 24 36 60

M
A

P
E

 [
%

]
(l
o

g
 s

ca
le

)

Training periods [months]

Persistent

k-NN

DT

Adaboost

Bagging

RF

GB

Linear SVR

mapped SVR

RBF-SVR

SWSVR

Fig. 2. MAPE for prediction after 1 h for each algorithm. Note that (b) and (c) are shown with log scale. 

 
S  

e  

h  

t  

i  

b  

p  

p  

r  

c  

t  

o  

d  

t  

d  

a  

l  

o  

e  

d

 
d  

m  

1  

t  

S  

t  

d  

l  

p  

c  

t  

c  

a  

p

 
z  

b  

s  

i  

w  

t  

p  

d  

S  
years before September 1, 2014, and testing periods were from 1

month to 1 year later the same day. By varying the training pe-

riods and the testing periods, the performance under the various

usage scenarios is evaluated. On the other hand, the periods for

10-fold cross-validation were 6 years from September 1, 2009 to

September 1, 2015. Note that the amount of data per month was

approximately 40 0 0 because the data was accumulated every 10

minutes. In this evaluation, we used the mean absolute percentage

error (MAPE) as the index of prediction error and building time

calculated based on the CPU clock time as the index of computa-

tional complexity. MAPE is shown as follows: 

MAP E = 100 
N 

N ∑ 
i =1 

∣∣∣∣ y i − ˆ y i y i 
∣∣∣∣

where N is the number of test data, y is the true value, and ˆ y is

the predicted value. Moreover, we evaluated the average of each

extraction rate by D-SDC in each experimental condition so as to

analyze the performance of SW-SVR and D-SDC further. All imple-

mentations for this evaluation are in Python, and implementations

in scikit-learn ( Pedregosa et al., 2012 ) were used for all methods

except SW-SVR. This evaluation was performed on a single core

of a machine with an Intel Core i5-2500 K Processor and 12GB of

RAM; even though several methods, such random forest and SW-

SVR, can be performed on parallel processing, the methods were

performed on a single core so as to evaluate the building time of

all methods fairly. 

4.2. Results and discussion 

Fig. 2 and 3 show the prediction error in the prediction hori-

zons of 1 h and 6 h, respectively. Note that a log scale is used

in Figs. 2 (b), (c), 3 (b), and (c). The results indicate that SW-SVR

produced the best average performance in all models during the

whole testing periods, training periods, and prediction horizons.

In particular, the effect occurs noticeably when testing periods are

longer than training periods. On the other hand, in this situation,

almost all methods except SW-SVR have often lower performance

than the naivest persistent model as baseline. The results demon-

strate that the conventional superior methods do not always dis-

play the great performance for micrometeorological data predic-

tion depending on difficulty of the prediction caused by training

periods and testing periods and prediction horizons. Moreover, in

algorithms based on SVR, the prediction performance of SW-SVR

is almost the best, followed in order by those of RBF-SVR, mapped
VR, and linear SVR. The difference between mapped SVR and lin-

ar SVR is due to the effect of mapping feature space. On the other

and, the difference between SW-SVR and mapped SVR is due to

he effect of ensemble learning based on D-SDC. These compar-

sons demonstrated that both mapping feature space and ensem-

le learning based on D-SDC are effective for improving prediction

erformance. Meanwhile, mapped SVR also tended to have lower

rediction performance than that of SW-SVR when the testing pe-

iods are longer than the training periods. Accordingly, under this

ondition, ensemble learning based on D-SDC is particularly effec-

ive. When the testing periods are longer than the training peri-

ds, the effective training data for predicting the test data is re-

uced. We considered that a little training data that D-SDC ex-

racted for building models corresponded to the effective training

ata for predicting the test data. Actually, Fig. 4 indicates the aver-

ge of each extraction rate by D-SDC and demonstrates that weak

earners of SW-SVR are always built from a very small proportion

f the whole training data. SW-SVR that always predicts microm-

teorological data accurately regardless of the amount of training

ata is very practical and useful. 

Table 1 shows the results of 10-fold cross-validation in the pre-

iction horizons of 1 h and 6 h. SW-SVR was often superior to all

ethods including RBF-SVM in hold-out validation. However, in

0-fold cross-validation, although SW-SVR had higher the predic-

ion performance than that of all methods except RBF-SVR, RBF-

VR was superior to SW-SVR slightly. The results demonstrate that

he prediction performance of SW-SVR is affected by temporal or-

er between training data and test data, and SW-SVR is particu-

arly suited to be used for practical use in which test data is always

redicted based on past training data. Meanwhile, even in 10-fold

ross-validation, the magnitude relation of the prediction error be-

ween mapped SVR and linear SVR and SW-SVR was same as the

ase of hold-out validation. Therefore, both mapping feature space

nd ensemble learning based on D-SDC are effective for improving

rediction performance in cross-validation. 

Fig. 5 and 6 show the building time in the prediction hori-

ons of 1 h and 6 h, respectively. Figs. 5 (a) and 6 (a) show the

uilding time of models that have high prediction performance as

hown in Figs. 2 and 3 , RF, GB, RBF-SVR, and SW-SVR, when train-

ng periods were varied. Note that the number of weak learners

as 10 0 0 in the ensemble learning series, cost parameter was 1 in

he SVR series, and σ of SW-SVR was 0.0 0 0 01; σ of SW-SVR was a
arameter of the RBF kernel in kernel approximation. These results

emonstrated that the building time of ensemble learning, such as

W-SVR, increases more gently than that of SVR. In particular, the


Y. Kaneda, H. Mineno / Expert Systems With Applications 59 (2016) 217–225 223 

(a) Testing periods: 1 month. (b) Testing periods: 6 months. (c) Testing periods: 12 months.

7.5

8.0

8.5

9.0

9.5

10.0

10.5

11.0

11.5

12.0

12.5

3 6 12 24 36 60

M
A

P
E

 [
%

]
(l

o
g

 s
c
a

le
)

Training periods [months]

Persistent

k-NN

DT

Adaboost

Bagging

RF

GB

Linear SVR

mapped SVR

RBF-SVR

SWSVR

3.0E+01

6.0E+01

1.2E+02

2.4E+02

4.8E+02

3 6 12 24 36 60

M
A

P
E

 [
%

]
(l

o
g

 s
c
a

le
)

Training periods [months]

Persistent

k-NN

DT

Adaboost

Bagging

RF

GB

Linear SVR

mapped SVR

RBF-SVR

SWSVR

2.0E+01

4.0E+01

8.0E+01

1.6E+02

3.2E+02

3 6 12 24 36 60

M
A

P
E

 [
%

]
(l

o
g

 s
c
a

le
)

Training periods [months]

Persistent

k-NN

DT

Adaboost

Bagging

RF

GB

Linear SVR

mapped SVR

RBF-SVR

SWSVR

Fig. 3. MAPE for prediction after 6 h for each algorithm. Note that (b) and (c) are shown with log scale. 

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

2.0

3 6 12 24 36 60

E
xt

ra
ct

io
n

 r
a
te

 [
%

]

Training periods [months]

Number of weak learners: 10

Number of weak learners: 100

Number of weak learners: 1000
0.0

2.0

4.0

6.0

8.0

10.0

12.0

14.0

16.0

18.0

3 6 12 24 36 60

E
xt

ra
ct

io
n

 r
a
te

 [
%

]

Training periods [months]

Number of weak learners: 10

Number of weak learners: 100

Number of weak learners: 1000

(b) Prediction horizons: 6 hours.(a) Prediction horizons: 1 hour.

Fig. 4. Average of each extraction rate by D-SDC in SW-SVR. 

Table 1 

MAPE of 10-fold cross-validation for each algorithm. 

Methods 

Prediction horizons SW-SVR k-NN DT Adaboost Bagging RF GB Linear SVR mapped SVR RBF-SVR Persistent 

1h 5 .18608 8 .59929 5 .81042 11 .10375 10 .24014 5 .57213 5 .27190 5 .43892 5 .25274 5 .16985 5 .96816 

6h 23 .49826 26 .52433 25 .99290 29 .93160 29 .58125 25 .55044 24 .14987 24 .68383 24 .26108 20 .94132 24 .86800 

1.E-02

1.E-01

1.E+00

1.E+01

1.E+02

1.E+03

1.E+04

1.E+05

1.E+06

T
im

e
 [
s
e

c
]

(l
o

g
 s

c
a

le
)

1.E-01

1.E+00

1.E+01

1.E+02

1.E+03

T
im

e
 [
s
e

c
]

(l
o

g
 s

c
a

le
)

RF: Depth = 5

RF: Depth = 10

GB: Depth = 5
GB: Depth = 10

SW-SVR

RF: Depth = 5

RF: Depth = 10

GB: Depth = 5

GB: Depth = 10

RBF SVR: σ = 0.1

RBF-SVR: σ = 0.00001

SW-SVR

-

1.E+00

1.E+01

1.E+02

1.E+03

1.E+04

1.E+05

T
im

e
 [
s
e

c
]

(l
o

g
 s

c
a

le
)

(b) Ensemble learning series. (c) SVR series.(a) Different training periods.

3 6 12 24 36 60

Training periods [months]
1 5 10 50 100

Cost

10 50 100 500 1000

Number of weak learners

Linear SVR
RBF-SVR: σ = 0.1
RBF-SVR: σ = 0.001
RBF-SVR: σ = 0.00001
SW SVR: σ = 0.1
SW SVR: σ = 0.001
SW SVR: σ = 0.00001

-
-
-

Fig. 5. Building time for prediction after 1 h for each model. Note that all figures are shown with log scale. 


224 Y. Kaneda, H. Mineno / Expert Systems With Applications 59 (2016) 217–225 

(b) Ensemble learning series. (c) SVR series.(a) Different training periods.

1.E+00

1.E+01

1.E+02

1.E+03

1.E+04

1.E+05

3 6 12 24 36 60

T
im

e
 [
s
e

c
]

(l
o

g
 s

c
a

le
)

Training periods [months]

1.E-01

1.E+00

1.E+01

1.E+02

1.E+03

10 50 100 500 1000
T

im
e

 [
s
e

c
]

(l
o

g
 s

c
a

le
)

Number of weak learners

1.E-02

1.E-01

1.E+00

1.E+01

1.E+02

1.E+03

1.E+04

1.E+05

1.E+06

1 5 10 50 100

T
im

e
 [
s
e

c
]

(l
o

g
 s

c
a

le
)

Cost

RF: Depth = 5

RF: Depth = 10

GB: Depth = 5

GB: Depth = 10

RBF-SVR: σ = 0.1

RBF-SVR: σ = 0.00001

SW-SVR

RF: Depth = 5

RF: Depth = 10

GB: Depth = 5

GB: Depth = 10

SW-SVR

Linear SVR
RBF-SVR: σ = 0.1
RBF-SVR: σ = 0.001
RBF-SVR: σ = 0.00001
SW-SVR: σ = 0.1
SW-SVR: σ = 0.001
SW-SVR: σ = 0.00001

Fig. 6. Building time for prediction after 6 h for each model. Note that all figures are shown with log scale. 

 
p  

t

 
m  

a  

a  

m  

f  

t  

S  

d  

p  

a  

S  

t  

t  

n  

w  

s  

S  

a  

M  

S  

t  

s  

s  

i  

s

5

 
m  

t  

a  

e  

u  

p  

c  

b  

t  

l  

m  
building time of SW-SVR is shortest when the training periods be-

come longer. In other words, the rate of building time increase of

SW-SVR is the gentlest in all the methods when training data in-

creases. These results indicate that, as mentioned above, the com-

putational complexity of SW-SVR is less than that of conventional

methods including random forest and gradient boosting. SW-SVR

is effective for training of an enormous amount of data in terms of

building time. 

Next, Figs. 5 (b) and 6 (b) show the building time of the mod-

els with better performance in ensemble learning, RF, GB, and SW-

SVR, when the number of weak learners was varied. Note that the

cost parameter of SW-SVR was 1, σ of SW-SVR was 0.0 0 0 01, and
training periods were 12 months. SW-SVR needs a longer build-

ing time than RF and GB using shallow DT when the number of

weak learners is lower. However, when the depth of DT becomes

deeper or the number of weak learners becomes higher, SW-SVR

can build the model faster than or at the same speed as RF and

GB. Moreover, SW-SVR, as with RF, can be run easily in parallel

environments, and it is expected that the building time of SW-SVR

will become even shorter. 

Finally, Figs. 5 (c) and 6 (c) show the building time of the mod-

els based on SVR when the parameters of SVR were varied. Note

that the number of weak learners was 100, and the training peri-

ods were 12 months. These results indicate that the building time

of SW-SVR is significantly shorter than that of RBF-SVR but longer

than linear SVR. Meanwhile, Fig. 4 demonstrates that weak learn-

ers of SW-SVR are always built from a very small proportion of the

whole training data. In particular, when prediction horizons were

1 h, the average of each extraction rate was 0.47 percent at best

and 1.82 percent at worst. On the other hand, when prediction

horizons were 6 h, the average of each extraction rate was 7.57

percent at best and 16.25 percent at worst. Nevertheless, the rea-

son the computational complexity of SW-SVR is larger than linear

SVR is that the increase of computational complexity due to build-

ing several models is larger. However, since the amount of training

data of each weak learner reduces substantially, the computational

complexity to build one model in SW-SVR reduces also. Accord-

ingly, when the number of models one CPU builds reduces by us-

ing parallel processing, the computational complexity of the overall

SW-SVR is lower than or equal to that of linear SVR. Meanwhile, as

with linear SVR, SW-SVR never depends on the change of param-

eters related to SVR, and the building time is always a constant.

As mentioned in the above discussion, the building time of SW-

SVR solely depends on the number of weak learners and training
 i  
eriods. Therefore, SW-SVR can avoid an unexpected long building

ime in parameter tuning that changes each parameter variously. 

These results demonstrate that SW-SVR predicts complicated

icrometeorological data with the best prediction performance

nd the lowest computational complexity compared with standard

lgorithms. In particular, we found that dynamic aggregation of

odels built from very little extracted data by D-SDC is effective

or compatibility of high prediction performance and low compu-

ational complexity. However, there are problems to be solved in

W-SVR. Firstly, the prediction performance of SW-SVR sometimes

eteriorates despite an increase of training data. In particular, this

roblem occurred under the conditions that prediction horizons

re 6 h as shown in Fig. 3 . This is because data extracted by D-

DC involves unnecessary training data for highly accurate predic-

ion. If D-SDC extracts the same data as the extracted data when

raining periods are shorter, the prediction performance of SW-SVR

ever deteriorates due to an increase of training data. Therefore,

e must review both feature mapping and algorithms of D-SDC

o as to avoid extracting unnecessary training data. Meanwhile,

W-SVR is based on a combination of several algorithms: kernel

pproximation, PLS regression, k-means, D-SDC, and linear SVR.

oreover, each algorithm has several parameters. Therefore, SW-

VR has more varied parameters, and it takes more time to tune

he parameters. In this experiment, we used a grid search roughly

o as to decide the parameters in a certain time. However, there is

till room for improvement in the prediction performance by us-

ng other approaches such as a genetic algorithm instead of a grid

earch ( Huang & Wang, 2006 ). 

. Conclusion and future work 

In this paper, we proposed a new methodology for predicting

icrometeorological data, SW-SVR that involves a novel combina-

ion of SVR and ensemble learning. To take the advantages of SVR

nd ensemble learning, SW-SVR builds several SVRs specialized for

ach representative data group in various natural environments by

sing D-SDC that extracts effective training data for specific data

rediction. Moreover, to follow micrometeorological data whose

haracteristics always change with time, prediction of SW-SVR is

ased on dynamically weighted ensemble learning depending on

he similarity between test data and each data specialized by weak

earners. As a result of evaluation experiments using large-scale

icrometeorological data, the prediction performance of SW-SVR

s greater than or equal to other general methods such as SVR, RF,


Y. Kaneda, H. Mineno / Expert Systems With Applications 59 (2016) 217–225 225 

a  

c  

f  

f  

m

 
u  

w  

F  

v  

t  

c  

m  

e  

S

A

 
K

R

A  

 
B  

C  

 
C  

 
C  

 
F  

 
F  
 

F  

H  
 

J  

K  
 

K  
 

L  

M  
 

M  

 
M  
 

O  
 

P  

 
P  

 
P  
 

R  
 

S  

 
S  

 
S  

S  
 

T  
 

T  

U  
 

V  

W  
 

X  

 
nd GB. Moreover, SW-SVR reduces the building time substantially

ompared with complicated models that have high prediction per-

ormance. We anticipate that dynamic aggregation of models built

rom various kinds of extracted data by D-SDC can contribute to

ore sophisticated studies of micrometeorological data prediction. 

In future work, we should evaluate SW-SVR in more varied sit-

ations to show that SW-SVR works effectively. In particular, we

ill use more complicated data that consists of many features.

urthermore, when SW-SVR is applied to applications such as en-

ironmental control systems, the performance of overall applica-

ions should be evaluated. Currently, we have developed an agri-

ultural support system using SW-SVR, which controls environ-

ents in greenhouses depending on the activity of the plants. The

valuation of the applications will describe the superiority of SW-

VR in practical use. 

cknowledgements 

This study was partially supported by JST, PRESTO , and JSPS

AKENHI ( 26 6 60198 ), Japan. 

eferences 

ntonanzas, J. , Urraca, R. , Martinez-de-Pison, F. J. , & Antonanzas-Torres, F. (2015).

Solar irradiation mapping with exogenous data from support vector regression
machines estimations. Energy Conversion and Management, 100 , 380–390 . 

reiman, L. (2001). Random forests. Machine Learning, 45 (1), 5–32 http://doi.org/10.
1023/A:1010933404324 . 

ao, H., Naito, T., & Ninomiya, Y. (2008). Approximate RBF kernel SVM and its ap-

plications in pedestrian classification. The 1st International Workshop on Ma-
chine Learning for Visionbased Motion Analysis - MLVMA’08 , 1–9 http://hal.

archives- ouvertes.fr/inria- 00325810/ . 
hang, C., & Lin, C. (2011). LIBSVM : A library for support vector machines. ACM

Transactions on Intelligent Systems and Technology (TIST), 2 , 1–39 http://doi.org/
10.1145/1961189.1961199 . 

hevalier, R. F. , Hoogenboom, G. , McClendon, R. W. , & Paz, J. A. (2011). Support vec-

tor regression with reduced training sets for air temperature prediction: A com-
parison with artificial neural networks. Neural Computing & Applications, 20 (1),

151–159 Retrieved from < Go to ISI>://WOS:0 0 028667480 0 015 . 
an, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., & Lin, C.-J. (2008). LIBLINEAR: A

library for large linear classification. The Journal of Machine Learning, 9 (2008),
1871–1874 http://doi.org/10.1038/oby.2011.351 . 

reund, Y., & Schapire, R. (1997). A desicion-theoretic generalization of on-line
learning and an application to boosting. Computational Learning Theory, 55 (1),

119–139 http://doi.org/10.1006/jcss.1997.1504 . 

riedman, J. H. (2001). Greedy function approximation: A gradient boosting ma-
chine. Annals of Statistics, 29 (5), 1189–1232 . 

uang, C. L., & Wang, C. J. (2006). A GA-based feature selection and parameters
optimizationfor support vector machines. Expert Systems with Applications, 31 (2),

231–240 http://doi.org/10.1016/j.eswa.2005.09.024 . 
apan Meteorological Agency. (n.d.).. Japan meteorological agency http://www.jma.go.

jp/jma/indexe.html . 

isi, O., & Cimen, M. (2012). Precipitation forecasting by using wavelet-support vec-
tor machine conjunction model. Engineering Applications of Artificial Intelligence,

25 (4), 783–792 http://doi.org/10.1016/j.engappai.2011.11.003 . 
olokotsa, D., Pouliezos, A., Stavrakakis, G., & Lazos, C. (2009). Predictive con-
trol techniques for energy and indoor environmental quality management in

buildings. Building and Environment, 44 (9), 1850–1863 http://doi.org/10.1016/j.
buildenv.20 08.12.0 07 . 

oosli, G. (2007). Comments on the core vector machines : fast SVM training on very
large data sets. The Journal of Machine Learning Research, 8 , 291–301 . 

acqueen, J. (1967). Some methods for classification and analysis of multivari-
ate observations. In Proceedings of the fifth berkeley symposium on mathemati-

cal statistics and probability: 1 (pp. 281–297). http://doi.org/citeulike- article- id:

6083430 . 
aity, R. , Bhagwat, P. , & Bhatnagar, A. (2010). Potential of support vector regression

for prediction of monthly streamflow using endogenous property. Hydrological
Processes, 24 (7), 917–923 . 

ohammadi, K. , Shamshirband, S. , Anisi, M. H. , Alam, K. A. , & Petkovi ́c, D. (2015).
Support vector regression based prediction of global solar radiation on a hori-

zontal surface. Energy Conversion and Management, 91 , 433–441 . 

thman, M. F., & Shazali, K. (2012). Wireless sensor network applications: A study
in environment monitoring system. In Procedia Engineering: 41 (pp. 1204–1210).

http://doi.org/10.1016/j.proeng.2012.07.302 . 
ark, D. H., & Park, J. W. (2011). Wireless sensor network-based greenhouse envi-

ronment monitoring and automatic control system for dew condensation pre-
vention. Sensors, 11 (4), 3640–3651 http://doi.org/10.3390/s110403640 . 

edregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel,

M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Courna-
peau, D., Brucher, M., Perrot, M. & Duchesnay, É. (2011). Scikit-learn: machine

learning in python. The Journal of Machine Learning Research 12, 2825–2830.
http://doi.org/10.1007/s13398- 014- 0173- 7.2 

latt, J. C. (1998). Fast training of support vector machines using sequential minimal
optimization. Advances in Kernel Methods , 185–208 http://doi.org/10.1109/ISKE.

2008.4731075 . 

ahimi, A., & Recht, B. (2007). Random features for large-scale kernel machines.
Advances in Neural Information Processing Systems, 20 , 1177–1184 http://doi.org/

10.1.1.145.8736 . 
ingh, K. P., Gupta, S., & Rai, P. (2013). Identifying pollution sources and predicting

urban air quality using ensemble learning methods. Atmospheric Environment,
80 , 426–437 http://doi.org/10.1016/j.atmosenv.2013.08.023 . 

mith, B. A., Hoogenboom, G., & McClendon, R. W. (2009). Artificial neural networks

for automated year-round temperature prediction. Computers and Electronics in
Agriculture, 68 (1), 52–61 http://doi.org/10.1016/j.compag.20 09.04.0 03 . 

uzuki, Y. , Kaneda, Y. , & Mineno, H. (2014). SW-SVR improved by short-distance data
collection method (pp. 1–8) IPSJ SIG Technical Report, 2014-MBL-73(9) . 

uzuki, Y., Kaneda, Y., & Mineno, H. (2015). Analysis of support vector regression
model for micrometeorological data prediction. Computer Science and Informa-

tion Technology, 3 (2), 37–48 http://doi.org/10.13189/csit.2015.030202 . 

enenhaus, M., Vinzi, V. E., Chatelin, Y. M., & Lauro, C. (2005). PLS path modeling.
Computational Statistics and Data Analysis, 48 (1), 159–205 http://doi.org/10.1016/

j.csda.20 04.03.0 05 . 
sang, I. W., Kwok, J. T., & Cheung, P.-M. (2005). Core vector machines: Fast SVM

training on very large data sets. Journal of Machine Learning Research, 6 , 363–
392 http://doi.org/10.1111/j.1442- 9993.2007.01810.x . 

rraca, R. , Antonanzas, J. , Martinez-de-Pison, F. J. , & Antonanzas-Torres, F. (2015).
Estimation of solar global irradiation in remote areas. Journal of Renewable and

Sustainable Energy, 7 (2), 023136 . 

apnik, V. N. (1995). The Nature of Statistical Learning Theory : Vol. 8. Springer http:
//doi.org/10.1109/TNN.1997.641482 . 

ang, B. X., & Japkowicz, N. (2009). Boosting support vector machines for imbal-
anced data sets. Knowledge and Information Systems, 25 (1), 1–20 http://doi.org/

10.1007/s10115- 009- 0198- y . 
ie, Y., Li, X., Ngai, E. W. T., & Ying, W. (2009). Customer churn prediction using

improved balanced random forests. Expert Systems with Applications, 36 (3 PART

1), 5445–5449 http://doi.org/10.1016/j.eswa.2008.06.121 . 

http://dx.doi.org/10.13039/100005197
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0001
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0001
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0001
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0001
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0001
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0001
http://doi.org/10.1023/A:1010933404324
http://hal.archives-ouvertes.fr/inria-00325810/
http://doi.org/10.1145/1961189.1961199
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0005
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0005
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0005
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0005
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0005
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0005
http://doi.org/10.1038/oby.2011.351
http://doi.org/10.1006/jcss.1997.1504
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0008
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0008
http://doi.org/10.1016/j.eswa.2005.09.024
http://www.jma.go.jp/jma/indexe.html
http://doi.org/10.1016/j.engappai.2011.11.003
http://doi.org/10.1016/j.buildenv.2008.12.007
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0012a
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0012a
http://doi.org/citeulike-article-id:6083430
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0014
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0014
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0014
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0014
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0014
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0015
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0015
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0015
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0015
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0015
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0015
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0015
http://doi.org/10.1016/j.proeng.2012.07.302
http://doi.org/10.3390/s110403640
http://doi.org/10.1007/s13398-014-0173-7.2
http://doi.org/10.1109/ISKE.2008.4731075
http://doi.org/10.1.1.145.8736
http://doi.org/10.1016/j.atmosenv.2013.08.023
http://doi.org/10.1016/j.compag.2009.04.003
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0022
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0022
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0022
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0022
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0022
http://doi.org/10.13189/csit.2015.030202
http://doi.org/10.1016/j.csda.2004.03.005
http://doi.org/10.1111/j.1442-9993.2007.01810.x
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0026
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0026
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0026
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0026
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0026
http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0026
http://doi.org/10.1109/TNN.1997.641482
http://doi.org/10.1007/s10115-009-0198-y
http://doi.org/10.1016/j.eswa.2008.06.121

	Sliding window-based support vector regression for predicting micrometeorological data
	1 Introduction
	2 Related work
	2.1 Support vector regression
	2.2 Ensemble learning

	3 SW-SVR: Sliding window-based support vector regression
	4 Evaluation
	4.1 Experiment
	4.2 Results and discussion

	5 Conclusion and future work
	 Acknowledgements
	 References