Sliding window-based support vector regression for predicting micrometeorological data Expert Systems With Applications 59 (2016) 217–225 Contents lists available at ScienceDirect Expert Systems With Applications journal homepage: www.elsevier.com/locate/eswa Sliding window-based support vector regression for predicting micrometeorological data Yukimasa Kaneda a , ∗, Hiroshi Mineno b , c a Graduate School of Integrated Science and Technology, Shizuoka University, 3-5-1 Johoku, Naka-ku, Hamamatsu, Shizuoka 432-8011, Japan b College of Informatics, Academic Institute, Shizuoka University, 3-5-1 Johoku, Naka-ku, Hamamatsu, Shizuoka 432-8011, Japan c JST, PRESTO, 4-1-8 Honcho, Kawaguchi, Saitama, 332-0012, Japan a r t i c l e i n f o Article history: Received 4 February 2016 Revised 29 March 2016 Accepted 13 April 2016 Available online 23 April 2016 Keywords: Predicting micrometeorological data Data extraction Dynamic aggregation Support vector regression Ensemble learning a b s t r a c t Sensor network technology is becoming more widespread and sophisticated, and devices with many sen- sors, such as smartphones and sensor nodes, have been used extensively. Since these devices have more easily accumulated various kinds of micrometeorological data, such as temperature, humidity, and wind speed, an enormous amount of micrometeorological data has been accumulated. In recent years, it has been expected that such an enormous amount of data, called big data, will produce novel knowledge and value. Accordingly, many current applications have used data mining technology or machine learn- ing to exploit big data. However, micrometeorological data has a complicated correlation among different features, and its characteristics change variously with time. Therefore, it is difficult to predict microme- teorological data accurately with low computational complexity even if state-of-the-art machine learning algorithms are used. In this paper, we propose a new methodology for predicting micrometeorological data, sliding window-based support vector regression (SW-SVR) that involves a novel combination of sup- port vector regression (SVR) and ensemble learning. To represent complicated micrometeorological data easily, SW-SVR builds several SVRs specialized for each representative data group in various natural envi- ronments, such as different seasons and climates, and changes weights to aggregate the SVRs dynamically depending on the characteristics of test data. In our experiment, we predicted the temperature after 1 h and 6 h by using large-scale micrometeorological data in Tokyo. As a result, regardless of testing periods, training periods, and prediction horizons, the prediction performance of SW-SVR was always greater than or equal to other general methods such as SVR, random forest, and gradient boosting. At the same time, SW-SVR reduced the building time remarkably compared with those of complicated models that have high prediction performance. © 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ ). 1 s t c s o c e a t a m a o s M f a m e a i l h 0 . Introduction Sensor network technology is becoming more widespread and ophisticated, and devices with many sensors have been used ex- ensively. The devices can very easily obtain various kinds of mi- rometeorological data such as temperature, humidity, and wind peed. Micrometeorological data is affected strongly by the surface f the earth and is related to our lives and industrial activity. Ac- ordingly, the data has been used by many applications such as nvironmental control systems for greenhouses ( Othman & Shaz- li, 2012; Park & Park, 2011 ). Moreover, more advanced applica- ions exploit the data to a greater extent by using machine learning nd data mining technology. Furthermore, an enormous amount of ∗ Corresponding author. E-mail address: kaneda@minelab.jp (Y. Kaneda). e c i ttp://dx.doi.org/10.1016/j.eswa.2016.04.012 957-4174/© 2016 The Authors. Published by Elsevier Ltd. This is an open access article u icrometeorological data has been accumulated by many devices, nd it has been expected that analyzing such an enormous amount f data, called big data, will produce novel knowledge and value. To predict micrometeorological data effectively, a number of re- earchers have studied machine learning ( Smith, Hoogenboom, & cClendon, 2009 ). These researchers described prediction methods or micrometeorological data; particularly, prediction performance nd computational complexity were often mentioned. Meanwhile, icrometeorological data has a complex correlation among differ- nt features such as temperature and humidity. Moreover, its char- cteristics change variously with time. Therefore, even if big data s given as training data, it is not easy to predict micrometeoro- ogical data accurately. Furthermore, in many cases, so that mod- ls can have high prediction performance, they have to become omplicated, and the computational complexity increases. Accord- ngly, some models probably cannot be built from big data in a nder the CC BY license ( http://creativecommons.org/licenses/by/4.0/ ). http://dx.doi.org/10.1016/j.eswa.2016.04.012 http://www.ScienceDirect.com http://www.elsevier.com/locate/eswa http://crossmark.crossref.org/dialog/?doi=10.1016/j.eswa.2016.04.012&domain=pdf http://creativecommons.org/licenses/by/4.0/ mailto:kaneda@minelab.jp http://dx.doi.org/10.1016/j.eswa.2016.04.012 http://creativecommons.org/licenses/by/4.0/ 218 Y. Kaneda, H. Mineno / Expert Systems With Applications 59 (2016) 217–225 2 l a m c g n e s t o e t 2 i o g l a r k l t 2 v h m v c p a m s s v t v m s w m m A a practical amount of computing time. In other words, there is a trade-off relationship between high prediction performance and low computational complexity. However, compatibility is required in some practical use. As the prediction performance in applica- tions becomes higher, the quality provided by the applications be- comes better. For example, in the case of environmental control systems based on prediction ( Kolokotsa, Pouliezos, Stavrakakis, & Lazos, 2009 ), the higher prediction performance enables the sys- tems to provide precise control, precise management, and better environments. On the other hand, models that need a long time for training are worthless in practical use. In current situations where the amount of usable data has increased remarkably, this trade-off relationship has become a more critical issue. Recently, one type of machine learning algorithm, support vec- tor machines (SVMs), have been used successfully in various fields. The basic theory is a more efficient learning method based on probably approximately correct (PAC) learning. Moreover, SVMs can separate non-linear data with low computational complex- ity. Since most data observed in the real world is likely to have non-linear relationships, SVMs have also been applied to mi- crometeorological data prediction ( Antonanzas, Urraca, Martinez- de-Pison, & Antonanzas-Torres, 2015; Mohammadi, Shamshirband, Anisi, Alam, & Petkovi ́c, 2015; Urraca, Antonanzas, Martinez-de- Pison, & Antonanzas-Torres, 2015 ). Moreover, SVMs led to better prediction performance than other algorithms such as artificial neural networks (ANNs) and the autoregressive integrated mov- ing average (ARIMA) model ( Chevalier, Hoogenboom, McClendon, & Paz, 2011; Maity, Bhagwat, & Bhatnagar, 2010 ). However, when SVMs learn big data, the computational complexity is still a matter of concern. Another alternative learning method, ensemble learn- ing, has also been used more widely for predicting micrometeo- rological data ( Singh, Gupta, & Rai, 2013 ). The prediction perfor- mance of ensemble learning is greater than or equal to that of SVMs. The basic methodology is a combination of weak learners built from different kinds of training data. The combination yields a higher generalizing capability that a single model cannot rep- resent. In particular, some researchers proposed improved meth- ods that could be applied to micrometeorological data prediction ( Wang & Japkowicz, 2009; Xie, Li, Ngai, & Ying, 2009 ). However, it is difficult to apply the methods to regression, and it is possi- ble that the models will not be able to follow micrometeorological data whose characteristics always change with time. In this paper, we propose a new methodology for predicting micrometeorological data, sliding window-based support vector re- gression (SW-SVR). SW-SVR involves a novel combination of sup- port vector regression (SVR) and ensemble learning. To represent complicated micrometeorological data easily, SW-SVR builds sev- eral SVRs specialized for each representative data group in vari- ous natural environments, such as different seasons and climates. The specialized SVRs are built based on our previous proposed method, dynamic short-distance data collection (D-SDC) that ex- tracts effective data for specific data prediction by taking account of movements: changes in data during prediction horizons. Each weak learner built from each extracted data specializes on spe- cific data and predicts accurately the data similar to the special- ized data. Then, SW-SVR aggregates all the predicted values based on weights decided by the similarity between test data and each data specialized by weak learners. This new ensemble learning methodology that changes weights dynamically enables following micrometeorological data whose characteristics hardly change with time. Our results demonstrated that the prediction performance of SW-SVR was always greater than or equal to that of other general methods such as SVR, random forest, and gradient boosting. At the same time, SW-SVR reduced the building time remarkably com- pared with that of complicated models that have high prediction performance. . Related work As mentioned in the introduction, to predict micrometeoro- ogical data effectively, SVMs and ensemble learning have gener- lly been used. These algorithms have higher prediction perfor- ance for micrometeorological data than traditional methods be- ause SVMs use not only a margin maximizing algorithm whose reat performance was proved by PAC learning but also the ker- el trick that enables non-linear separation. On the other hand, nsemble learning provides higher generalizing capability that a ingle model cannot represent. In this section, a brief summary of hese algorithms and some improved algorithms are given. More- ver, so that SW-SVR can draw advantages from both SVMs and nsemble learning, several problems of these algorithms for prac- ical use are discussed. .1. Support vector regression SVMs, introduced by Vapnik,(1995 ), have been used successfully n various fields. In the simplest case, binary classification, SVMs btain a separating hyperplane decided by maximizing the mar- in. The margin means the norms between different classes. PAC earning proved that maximizing the margin produces high gener- lization ability. Moreover, the kernel trick enables SVMs to sepa- ate data non-linearly with low computational complexity. Various inds of data observed in the real world are likely to have non- inear relationships. Accordingly, SVMs are used in many applica- ions such as micrometeorological data prediction ( Kisi & Cimen, 012; Maity et al., 2010 ). Meanwhile, SVMs for regression, support ector regression (SVR), uses the same methodology as SVMs that ave the highest generalization ability. In this section, a brief sum- ary of SVR is given as follows. First, the linear function for regression is given as follows: f ( x ) = w T x + b. Then, as with SVMs, SVR also minimizes the norm of the weight ector w ; the L 2 norm ‖ w ‖ 2 is often used, and minimizing ‖ w ‖ 2 orresponds to maximizing the margin. Meanwhile, SVR tolerates rediction error �. Therefore, the primal problem of SVR is shown s follows: inimize ‖ w ‖ 2 2 ubject to { y i − ( w T x i + b ) ≤ �( w T x i + b ) − y i ≤ �. Moreover, to take some errors into account further, the same lack variables ξ as soft margin SVMs are introduced. The slack ariables mean penalties and increase in proportion to errors be- ween true values and predicted values. The problem that the slack ariables are introduced into is shown as follows: inimize ‖ w ‖ 2 2 + C ∑ i ( ξi + ξ ∗i ) ubject to ⎧ ⎨ ⎩ y i − ( w T x i + b ) ≤ � + ξi ( w T x i + b ) − y i ≤ � + ξ ∗i ξi , ξ ∗ i ≥ 0 . here the constant C means the balance between the effect of aximizing the margin and penalties. To minimize the above for- ula, the slack variables in the formula must also be minimized. ccordingly, the slack variables depending on the errors are shown s follows: ξi = { 0 ( y i − ( w T x i + b ) ≤ � ) y i − ( w T x i + b ) − � otherwise Y. Kaneda, H. Mineno / Expert Systems With Applications 59 (2016) 217–225 219 ξ t t w t r c A s m m m s l l ϕ d h o t a c w p ( i p d c t p & b s t d r n p v p f l p u c 2 i t T g Algorithm 1 Bagging for regression. Input: Training data: D = { ( x 1 , y 1 ) , . . . , ( x N , y N ) } where x i ∈ X , y i ∈ Y Number of weak learners: n For t = 1 to n do 1. D t ← generate sample from D with replacement 2. H t ( X ) ← build a weak learner from D t Output: H(X ) = 1 n n ∑ t=1 H t (X ) Algorithm 2 Boosting for regression. Input: Training data: D = { ( x 1 , y 1 ) , . . . , ( x N , y N ) } where x i ∈ X , y i ∈ Y Number of weak learners: n Weights: w i = 1/ N For t = 1 to n do 1. H t ( X ) ← build a weak learner from D by using weights w t 2. �t ← compute error rate of H t ( X ) 3. αt ← compute reliability of prediction result of H t ( X ) based on �t 4. w t+1 ← update weights w t based on αt Output: H(X ) = n ∑ t=1 ( αt H t (X ) ) / n ∑ t=1 αt r i p b w p A a T n m ( a c i s i e e l s d i b g A t e i i i ( f b b t a p ∗ i = { 0 (( w T x i + b ) − y i ≤ � ) ( w T x i + b ) − y i − � otherwise . The above formulas mean that a penalty is not given when he error is lower than �, but the error is regarded as a penalty hat cannot be tolerated when the error is higher than �. In other ords, SVR tolerates errors less than �, but errors over � are solely aken into account as penalties. Finally, the dual problem is de- ived from the above primal problem by Lagrange multiplier and orresponds to a quadratic programming problem as with SVMs. s a result, since a unique global optimal solution is solved, SVR is uperior to traditional algorithms that might fall into a local opti- al solution, such as ANNs. The dual problem derived by Lagrange ultiplier is shown as follows: aximize − 1 2 ∑ i, j ( αi + α∗i ) ( α j + α∗j ) x T i x j − � ∑ i ( αi + α∗i ) + ∑ i y i ( αi − α∗i ) ubject to {∑ i ( αi − α∗i ) = 0 αi , α ∗ i ∈ [ 0 , C ] . Moreover, the above dual problem can easily involve non- inear map ϕ to consider a higher dimension. To introduce non- inear map ϕ in the above problem, kernel function K(x i , x j ) = t (x i ) ϕ(x j ) is defined and used instead of x T i x j . Then ϕ t (x i ) ϕ (x j ) is etermined based on K( x i , x j ) without calculation on a mapped igher dimension; this method is called the kernel trick. SVR based n maximizing the margin and the kernel trick yields high predic- ion performance. Meanwhile, conventional quadratic programming solvers, such s the steepest descent method, have very high computational omplexity; the computational complexity is approximately O ( N 3 ) here N is the number of training data. Accordingly, a quadratic rogramming solver for SVMs, sequential minimal optimization SMO), has become de facto standard ( Platt, 1998 ). SMO special- zed for SVM reduce the computational complexity of SVM to ap- roximately O ( N 2 ). Nevertheless, when an enormous amount of ata is inputted as training data, the computational complexity in- reases substantially. To solve the problem, a theory that regards he quadratic programming problem as a computational geometry roblem, core vector machine (CVM), was proposed ( Tsang, Kwok, Cheung, 2005 ). The prediction performance of CVM is compara- le to that of SVMs, and the computational complexity decreases ubstantially. However, according to a paper ( Loosli, 2007 ), predic- ion performance and computational complexity of CVM strongly epend on the values of parameters. Therefore, when essential pa- ameter tuning for practical use is taken into account, CVM does ot always satisfy both high prediction performance and low com- utational complexity. SVR is one of the best algorithms in machine learning from the iewpoint of prediction performance. In particular, it has been ex- ected that the kernel trick used in the dual problem is effective or predicting micrometeorological data that has a complex corre- ation among different f eatures. However, the com putational com- lexity to solve the dual problem is often still long for practical se. Thus, it is difficult to apply conventional SVR directly to mi- rometeorological data prediction. .2. Ensemble learning Ensemble learning has been studied recently and used increas- ngly. The basic methodology of ensemble learning is a combina- ion of weak learners built from different kinds of training data. he combination yields a higher generalizing capability that a sin- le model cannot represent. As with SVMs, ensemble learning can epresent non-linear relationships and has been used for predict- ng micrometeorological data. In particular, the two kinds of ap- roaches, bagging and boosting, have often been used in ensem- le learning. The approaches differ greatly on the method to build eak learners and aggregate them. Bagging uses several training data generated by bootstrap sam- ling. The algorithm of basic bagging for regression is shown in lgorithm. 1 . In bagging, different kinds of training data are cre- ted by sampling inputted original training data with replacement. hen, weak learners are built from each sampled training data. Fi- ally, each predicted value is aggregated by majority vote or arith- etic average. In particular, random forest, introduced by Breiman Breiman, 2001 ), to which randomness in feature selection is also pplied, often demonstrates better prediction performance than onventional models such as SVMs. Random forest is used in var- ous applications and has been extended to other improved ver- ions. For example, to predict imbalanced data observed frequently n the real world more accurately, improved balanced random for- st (IBRF) has been proposed ( Xie et al., 2009 ). IBRF involves an fficient sampling method for imbalanced data and cost-sensitive earning that penalizes misclassification of minority class more trongly. The authors showed that IBRF was more effective to pre- ict imbalanced data than class-weighted SVMs and conventional mproved random forest for imbalanced data prediction. Boosting builds repeatedly weak learners by using weights ased on the error rate. The algorithm of basic boosting for re- ression such as Adaboost ( Freund & Schapire, 1997 ) is shown in lgorithm. 2 . Unlike bagging, almost all boosting algorithms use he same training data, but the training data is weighted repeat- dly. Boosting alternates between building weak learners by us- ng weights and updating weights. Finally, each predicted value s aggregated by weighted average. Various kinds of algorithms n boosting have been studied and proposed; gradient boosting Friedman, 2001 ) in particular has shown the best prediction per- ormance in many competitions. Meanwhile, as with IBRF, the oosting algorithm for imbalanced data, boosting-SVM, has also een proposed ( Wang & Japkowicz, 2009 ). The main characteris- ic of boosting-SVM is using asymmetric misclassification cost. The uthors demonstrated that boosting-SVM enabled more accurate rediction of both the majority class and minority class. 220 Y. Kaneda, H. Mineno / Expert Systems With Applications 59 (2016) 217–225 Training data Extracted data Center of cluster Test data Number of weak learners: 3 Threshold of extraction Training data Specialized object Movement of data Training data at end of prediction horizon (a) Extraction of training data by D-SDC. (b) Weighted ensemble learning in SW-SVR. Fig. 1. Processing overview of SW-SVR. Algorithm 3 Sliding window-based support vector regression. Input: Training data set: S = { ( x 1 , y 1 , x ′ 1 ) , . . . , ( x N , y N , x ′ N ) } where x i ∈ X, y i ∈ Y, x ′ i ∈ X ′ Test data: P Number of weak learners: n Weight parameters: p , q Preprocessing: 1. apply normalization to X and X ′ 2. fit kernel approximation and PLS regression to X and X ′ 3. M i = || x i − x ′ i || , i = 1 . . . N 4. G t ← each center of kmenas (X ) , t = 1 . . . n For t = 1 to n do 1. D ti = || G t − x i || , i = 1 . . . N 2. r t = N ∑ i =1 ( w i M i ) / N ∑ i =1 ( w i ) where w i = 1 /D p ti 3. S t = { ( x i , y i ) | D ti < r t } , i = 1 . . . N 4. H t ( X ) ← train LinearSVR ( S t ) Output: H ( P ) = n ∑ t=1 ( w t H t (P) ) / n ∑ t=1 ( w t ) where w t = 1 / || G t − P|| q l 2 a r t a p t t i d r p f c t e e a i o f m w m When micrometeorological data including many unusual natu- ral environments is regarded as imbalanced data, the above meth- ods are likely to classify micrometeorological data more accurately. However, these approaches cannot be applied to regression. More- over, according to our previous research ( Suzuki, Kaneda, & Mi- neno, 2015 ), there is proper training data depending on test data. In other words, weights to aggregate weak learners built from dif- ferent kinds of training data should depend on test data. 3. SW-SVR: Sliding window-based support vector regression We propose a new methodology for predicting micrometeoro- logical data, sliding window-based support vector regression, com- bining methodologies of SVR and ensemble learning. The basic the- ories are based on D-SDC, our previous proposed method to extract effective data for specific data prediction, and novel weighted en- semble learning as shown in Fig. 1 . First, to represent complicated micrometeorological data easily, SW-SVR builds several SVRs spe- cialized for each representative data group in various natural envi- ronments, such as different seasons and climates. The specialized SVRs are built based on D-SDC that extracts effective data for spe- cific data prediction by taking account of movements: changes of data during prediction horizons ( Fig. 1 (a)). Each weak learner built from each extracted data specializes on specific data and accurately predicts the data similar to the specialized data. Afterward, each weak learner is aggregated with weights determined dynamically at the time of prediction so as to maintain the prediction perfor- mance of micrometeorological data whose characteristics always change with time ( Fig. 1 (b)). The weights are decided by the simi- larity between test data and each data specialized by weak learn- ers. Even if the characteristics of micrometeorological data always change with time, SW-SVR always gives priority to weak learners that are more suitable for predicting test data. The details of the SW-SVR algorithm are shown in Algorithm. 3 . The procedure for training consists of two kinds of preprocessing, iterated learning, and dynamic aggregation. The procedures of each part are shown as follows. The below-mentioned algorithms in SW-SVR use the L 2 norm: the Euclid distance, and the performance is related to feature space. For example, if feature space includes noisy features or non-linear relationships between features, the performance will probably be reduced substantially. In particular, micrometeoro- logical data has a complex correlation among different features such as temperature and humidity. Accordingly, feature space must be mapped into other feature space that takes into account the presence of noise and non-linear relationships. In our approach, we use kernel approximation ( Rahimi & Recht, 2007 ) and partial east squares (PLS) regression ( Tenenhaus, Vinzi, Chatelin, & Lauro, 005 ) to map into new feature space. Kernel approximation gener- tes new feature space and involves higher dimensions that rep- esent non-linear data as linear data with a very low computa- ional complexity. Actually, a combination of kernel approximation nd linear SVMs led to faster prediction performance that is com- arable to that of exact SVM ( Cao, Naito, & Ninomiya, 2008 ). On he other hand, PLS regression is a supervised dimension reduc- ion methodology. This method can reduce dimensions by extract- ng latent variables that have a strong relationship with a depen- ent variable. If feature space includes noisy features, the effect is educed because of PLS regression. The combination of kernel ap- roximation and PLS regression enables SW-SVR to use effective eature space for calculation of the L 2 norm in micrometeorologi- al data. According to our previous research, to accurately predict par- icular specific data in micrometeorological data, it is necessary to xtract effective training data for specific data prediction ( Suzuki t al., 2015 ). In SW-SVR, these several specific data is selected in dvance, and weak learners are built from extracted effective train- ng data for predicting each specific data. Meanwhile, micromete- rological data involves various natural environments such as dif- erent seasons and climates. Therefore, each selected specific data ust represent more varied natural environments that probably ill appear so as to represent micrometeorological data by several odels. In SW-SVR, each specific data is selected by a clustering Y. Kaneda, H. Mineno / Expert Systems With Applications 59 (2016) 217–225 221 a m d I n T s t t K t s D a o f w p a e w v c d m i m d o d l a d a t s m t a r o n m w g t b l a b t i t l p i m d o p t S w d s H a s c d T u p S i i N r i t w c t O 4 4 m g S p k b b t s p m p m n y w p t a i a a t c i lgorithm, k-means ( Macqueen, 1967 ). The k-means is one of the ost famous non-hierarchical clustering algorithms and classifies ata faster under several clusters than other clustering algorithms. n SW-SVR, the k-means classifies all training data into the same umber of clusters as the number of weak learners given by users. hen, each center of clusters is used as specific data that repre- ents various natural environments. After selecting several specific data, SW-SVR iterates data ex- raction and building a model. First, SW-SVR extracts effective raining data for predicting each specific data by D-SDC ( Suzuki, aneda, & Mineno, 2014 ). The theory of D-SDC is similar to that of he k-nearest neighbor (k-NN) algorithm, and D-SDC also extracts ome training data similar to a specialized object. However, in our -SDC, the amount of extracted data depends on the movement of specialized object with time. The movement r means the change f a specialized object during prediction horizons as shown in the ollowing equation: r t = ‖G t − G ′ t ‖ here G is a specialized object, and G ′ is a specialized object after rediction horizons. D-SDC extracts training data whose norm from specialized object is shorter than the movement r . Accordingly, xtracted training data S by D-SDC is given as follows: S t = { ( x i , y i ) | ‖G t − x i ‖ < ‖G t − G ′ t ‖ } here x is the feature of training data and y is the dependent ariable of training data. D-SDC is based on the movement r be- ause the movement r is strongly related to autocorrelation of ata surrounding a specialized object. In micrometeorological data, ovements in specific natural environments are mutually sim- lar, and the autocorrelation becomes lower when these move- ents are bigger. For example, in Japan, the change of weather is rastic every spring, and the natural environments change various ther natural environments with time. Meanwhile, when we pre- ict time series data such as micrometeorological data, autocorre- ation means correlation between features and a dependent vari- ble, and more training data is required for highly accurate pre- iction when autocorrelation is lower. Since D-SDC extracts the mount of data surrounding a specialized object in proportion to he movement r , extraction that considers autocorrelation of data urrounding a specialized object is achieved. However, the move- ent r is unknown because G ′ is not observed. Meanwhile, as men- ioned above, movements of data surrounding a specialized object re mutually similar. Therefore, D-SDC estimates the movement based on movements of training data similar to a specialized bject by weighted average, where the weights are reciprocals of orms between a specialized object and each training data. Move- ents of training data can be calculated by referring to the time hen each training data is observed. The estimated movement r is iven as follows: r t = ‖G t − G ′ t ‖≈ ∑ N i =1 w i ‖x i − x ′ i ‖ ∑ N i =1 w i where w i = 1 ‖G t − x i ‖ p , N is the number of training data, and p is a weighted parame- er. Afterward, SW-SVR builds several linear SVRs as weak learners ased on the extracted data. As described above, a combination of inear SVR and kernel approximation is comparable to SVR using kernel method. Moreover, linear SVR can be built much faster y using liblinear ( Fan, Chang, Hsieh, Wang, & Lin, 2008 ), an op- imized implementation for linear SVMs, instead of other general mplementations of SVMs such as libSVM ( Chang & Lin, 2011 ). Al- hough a usable kernel in liblinear is restricted to the linear kernel, iblinear can build the model much faster by solving the primal roblem instead of the dual problem. Furthermore, since all train- ng data is divided into smaller amounts of extracted data, each odel can be built faster, and it is easier to learn each extracted ata by parallel processing. The predicted values of SW-SVR take into account the change f natural environments with time. In general ensemble learning, rediction for regression depends on the weighted average, and he weights are determined at the time of training. However, SW- VR determines weights dynamically at the time of prediction. The eights are determined by the norm between test data and each ata specialized by weak learners. A final hypothesis of SW-SVR is hown as follows: ( P ) = ∑ n t=1 w t H t ( P ) ∑ n t=1 w t where w t = 1 ‖ G t − P ‖ q , P is the test data, n is the number of weak learners, H ( X ) is hypothesis, and q is a weighted parameter. In our approach, ince the weights of ensemble learning are determined dynami- ally for every prediction, SW-SVR can follow micrometeorological ata whose characteristics always change with time. Finally, we describe the computational complexity of SW-SVR. o represent complicated micrometeorological data easily, SW-SVR ses the various conventional methods besides D-SDC we pro- osed: kernel approximation, PLS regression, k-means, and linear VR. The computational complexity of these methods in general ncreases linearly; in other words, the computational complexity s approximately equal to O ( N ) where the number of training data is even bigger than the number of the dimensions and each pa- ameter of these methods. Moreover, the computational complex- ty of D-SDC corresponds to O ( nN ) because D-SDC just iterates N imes of distance calculation n + 1 times where n is the number of eak learners in SW-SVR. Therefore, if N is even bigger than n , the omputational complexity of D-SDC also increases linearly. The to- al computational complexity of SW-SVR is approximately equal to ( N ) that is even less than that of SVR. . Evaluation .1. Experiment We compared the performance of SW-SVR with other standard ethods for regression: k-NN, decision tree (DT), Adaboost, bag- ing, random forest (RF), gradient boosting (GB), linear SVR, and VR using a radial basis function (RBF) kernel that shows higher erformance in various fields (RBF-SVR). Note that the kernel of ernel approximation in SW-SVR is also the RBF kernel, and the ase learner in Adaboost and bagging is the decision tree that has een used generally. Moreover, to evaluate SW-SVR in more de- ail, we evaluated the performance of linear SVR with mapping: tandard linear SVR to which the same mapping as SW-SVR is ap- lied (“mapped SVR”). Mapped SVR clarifies each performance of apping feature space and ensemble learning based on D-SDC. All arameters of the used models were adjusted by the grid search ethod. Baseline for this evaluation was the performance of the aivest persistent model as shown in the following formula: ˆ i + �t = y i here ˆ y is the predicted value, y is the true value, and �t is the rediction horizons. We evaluated the performance by two ways: hold-out valida- ion and 10-fold cross-validation. We predicted the temperature fter 1 h and 6 h by using large-scale micrometeorological data n Tokyo ( Japan Meteorological Agency, n.d. ). The data consists of tmospheric pressure, temperature, relative humidity, wind speed, nd irradiance. In hold-out validation, training periods are limited o the earlier periods than testing periods so as to assume practi- al use; test data is always predicted based on past training data n practical use. The training periods were from 3 months to 5 222 Y. Kaneda, H. Mineno / Expert Systems With Applications 59 (2016) 217–225 (a) Testing periods: 1 month. (b) Testing periods: 6 months. (c) Testing periods: 12 months. 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 3 6 12 24 36 60 M A P E [ % ] (l o g s ca le ) Training periods [months] Persistent k-NN DT Adaboost Bagging RF GB Linear SVR mapped SVR RBF-SVR SWSVR 8.0E+00 1.6E+01 3.2E+01 6.4E+01 1.3E+02 2.6E+02 5.1E+02 3 6 12 24 36 60 M A P E [ % ] (l o g s ca le ) Training periods [months] Persistent k-NN DT Adaboost Bagging RF GB Linear SVR mapped SVR RBF-SVR SWSVR 5.0E+00 1.0E+01 2.0E+01 4.0E+01 8.0E+01 1.6E+02 3.2E+02 3 6 12 24 36 60 M A P E [ % ] (l o g s ca le ) Training periods [months] Persistent k-NN DT Adaboost Bagging RF GB Linear SVR mapped SVR RBF-SVR SWSVR Fig. 2. MAPE for prediction after 1 h for each algorithm. Note that (b) and (c) are shown with log scale. S e h t i b p p r c t o d t d a l o e d d m 1 t S t d l p c t c a p z b s i w t p d S years before September 1, 2014, and testing periods were from 1 month to 1 year later the same day. By varying the training pe- riods and the testing periods, the performance under the various usage scenarios is evaluated. On the other hand, the periods for 10-fold cross-validation were 6 years from September 1, 2009 to September 1, 2015. Note that the amount of data per month was approximately 40 0 0 because the data was accumulated every 10 minutes. In this evaluation, we used the mean absolute percentage error (MAPE) as the index of prediction error and building time calculated based on the CPU clock time as the index of computa- tional complexity. MAPE is shown as follows: MAP E = 100 N N ∑ i =1 ∣∣∣∣ y i − ˆ y i y i ∣∣∣∣ where N is the number of test data, y is the true value, and ˆ y is the predicted value. Moreover, we evaluated the average of each extraction rate by D-SDC in each experimental condition so as to analyze the performance of SW-SVR and D-SDC further. All imple- mentations for this evaluation are in Python, and implementations in scikit-learn ( Pedregosa et al., 2012 ) were used for all methods except SW-SVR. This evaluation was performed on a single core of a machine with an Intel Core i5-2500 K Processor and 12GB of RAM; even though several methods, such random forest and SW- SVR, can be performed on parallel processing, the methods were performed on a single core so as to evaluate the building time of all methods fairly. 4.2. Results and discussion Fig. 2 and 3 show the prediction error in the prediction hori- zons of 1 h and 6 h, respectively. Note that a log scale is used in Figs. 2 (b), (c), 3 (b), and (c). The results indicate that SW-SVR produced the best average performance in all models during the whole testing periods, training periods, and prediction horizons. In particular, the effect occurs noticeably when testing periods are longer than training periods. On the other hand, in this situation, almost all methods except SW-SVR have often lower performance than the naivest persistent model as baseline. The results demon- strate that the conventional superior methods do not always dis- play the great performance for micrometeorological data predic- tion depending on difficulty of the prediction caused by training periods and testing periods and prediction horizons. Moreover, in algorithms based on SVR, the prediction performance of SW-SVR is almost the best, followed in order by those of RBF-SVR, mapped VR, and linear SVR. The difference between mapped SVR and lin- ar SVR is due to the effect of mapping feature space. On the other and, the difference between SW-SVR and mapped SVR is due to he effect of ensemble learning based on D-SDC. These compar- sons demonstrated that both mapping feature space and ensem- le learning based on D-SDC are effective for improving prediction erformance. Meanwhile, mapped SVR also tended to have lower rediction performance than that of SW-SVR when the testing pe- iods are longer than the training periods. Accordingly, under this ondition, ensemble learning based on D-SDC is particularly effec- ive. When the testing periods are longer than the training peri- ds, the effective training data for predicting the test data is re- uced. We considered that a little training data that D-SDC ex- racted for building models corresponded to the effective training ata for predicting the test data. Actually, Fig. 4 indicates the aver- ge of each extraction rate by D-SDC and demonstrates that weak earners of SW-SVR are always built from a very small proportion f the whole training data. SW-SVR that always predicts microm- teorological data accurately regardless of the amount of training ata is very practical and useful. Table 1 shows the results of 10-fold cross-validation in the pre- iction horizons of 1 h and 6 h. SW-SVR was often superior to all ethods including RBF-SVM in hold-out validation. However, in 0-fold cross-validation, although SW-SVR had higher the predic- ion performance than that of all methods except RBF-SVR, RBF- VR was superior to SW-SVR slightly. The results demonstrate that he prediction performance of SW-SVR is affected by temporal or- er between training data and test data, and SW-SVR is particu- arly suited to be used for practical use in which test data is always redicted based on past training data. Meanwhile, even in 10-fold ross-validation, the magnitude relation of the prediction error be- ween mapped SVR and linear SVR and SW-SVR was same as the ase of hold-out validation. Therefore, both mapping feature space nd ensemble learning based on D-SDC are effective for improving rediction performance in cross-validation. Fig. 5 and 6 show the building time in the prediction hori- ons of 1 h and 6 h, respectively. Figs. 5 (a) and 6 (a) show the uilding time of models that have high prediction performance as hown in Figs. 2 and 3 , RF, GB, RBF-SVR, and SW-SVR, when train- ng periods were varied. Note that the number of weak learners as 10 0 0 in the ensemble learning series, cost parameter was 1 in he SVR series, and σ of SW-SVR was 0.0 0 0 01; σ of SW-SVR was a arameter of the RBF kernel in kernel approximation. These results emonstrated that the building time of ensemble learning, such as W-SVR, increases more gently than that of SVR. In particular, the Y. Kaneda, H. Mineno / Expert Systems With Applications 59 (2016) 217–225 223 (a) Testing periods: 1 month. (b) Testing periods: 6 months. (c) Testing periods: 12 months. 7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 3 6 12 24 36 60 M A P E [ % ] (l o g s c a le ) Training periods [months] Persistent k-NN DT Adaboost Bagging RF GB Linear SVR mapped SVR RBF-SVR SWSVR 3.0E+01 6.0E+01 1.2E+02 2.4E+02 4.8E+02 3 6 12 24 36 60 M A P E [ % ] (l o g s c a le ) Training periods [months] Persistent k-NN DT Adaboost Bagging RF GB Linear SVR mapped SVR RBF-SVR SWSVR 2.0E+01 4.0E+01 8.0E+01 1.6E+02 3.2E+02 3 6 12 24 36 60 M A P E [ % ] (l o g s c a le ) Training periods [months] Persistent k-NN DT Adaboost Bagging RF GB Linear SVR mapped SVR RBF-SVR SWSVR Fig. 3. MAPE for prediction after 6 h for each algorithm. Note that (b) and (c) are shown with log scale. 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 3 6 12 24 36 60 E xt ra ct io n r a te [ % ] Training periods [months] Number of weak learners: 10 Number of weak learners: 100 Number of weak learners: 1000 0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0 18.0 3 6 12 24 36 60 E xt ra ct io n r a te [ % ] Training periods [months] Number of weak learners: 10 Number of weak learners: 100 Number of weak learners: 1000 (b) Prediction horizons: 6 hours.(a) Prediction horizons: 1 hour. Fig. 4. Average of each extraction rate by D-SDC in SW-SVR. Table 1 MAPE of 10-fold cross-validation for each algorithm. Methods Prediction horizons SW-SVR k-NN DT Adaboost Bagging RF GB Linear SVR mapped SVR RBF-SVR Persistent 1h 5 .18608 8 .59929 5 .81042 11 .10375 10 .24014 5 .57213 5 .27190 5 .43892 5 .25274 5 .16985 5 .96816 6h 23 .49826 26 .52433 25 .99290 29 .93160 29 .58125 25 .55044 24 .14987 24 .68383 24 .26108 20 .94132 24 .86800 1.E-02 1.E-01 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 T im e [ s e c ] (l o g s c a le ) 1.E-01 1.E+00 1.E+01 1.E+02 1.E+03 T im e [ s e c ] (l o g s c a le ) RF: Depth = 5 RF: Depth = 10 GB: Depth = 5 GB: Depth = 10 SW-SVR RF: Depth = 5 RF: Depth = 10 GB: Depth = 5 GB: Depth = 10 RBF SVR: σ = 0.1 RBF-SVR: σ = 0.00001 SW-SVR - 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 T im e [ s e c ] (l o g s c a le ) (b) Ensemble learning series. (c) SVR series.(a) Different training periods. 3 6 12 24 36 60 Training periods [months] 1 5 10 50 100 Cost 10 50 100 500 1000 Number of weak learners Linear SVR RBF-SVR: σ = 0.1 RBF-SVR: σ = 0.001 RBF-SVR: σ = 0.00001 SW SVR: σ = 0.1 SW SVR: σ = 0.001 SW SVR: σ = 0.00001 - - - Fig. 5. Building time for prediction after 1 h for each model. Note that all figures are shown with log scale. 224 Y. Kaneda, H. Mineno / Expert Systems With Applications 59 (2016) 217–225 (b) Ensemble learning series. (c) SVR series.(a) Different training periods. 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 3 6 12 24 36 60 T im e [ s e c ] (l o g s c a le ) Training periods [months] 1.E-01 1.E+00 1.E+01 1.E+02 1.E+03 10 50 100 500 1000 T im e [ s e c ] (l o g s c a le ) Number of weak learners 1.E-02 1.E-01 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1 5 10 50 100 T im e [ s e c ] (l o g s c a le ) Cost RF: Depth = 5 RF: Depth = 10 GB: Depth = 5 GB: Depth = 10 RBF-SVR: σ = 0.1 RBF-SVR: σ = 0.00001 SW-SVR RF: Depth = 5 RF: Depth = 10 GB: Depth = 5 GB: Depth = 10 SW-SVR Linear SVR RBF-SVR: σ = 0.1 RBF-SVR: σ = 0.001 RBF-SVR: σ = 0.00001 SW-SVR: σ = 0.1 SW-SVR: σ = 0.001 SW-SVR: σ = 0.00001 Fig. 6. Building time for prediction after 6 h for each model. Note that all figures are shown with log scale. p t m a a m f t S d p a S t t n w s S a M S t s s i s 5 m t a e u p c b t l m building time of SW-SVR is shortest when the training periods be- come longer. In other words, the rate of building time increase of SW-SVR is the gentlest in all the methods when training data in- creases. These results indicate that, as mentioned above, the com- putational complexity of SW-SVR is less than that of conventional methods including random forest and gradient boosting. SW-SVR is effective for training of an enormous amount of data in terms of building time. Next, Figs. 5 (b) and 6 (b) show the building time of the mod- els with better performance in ensemble learning, RF, GB, and SW- SVR, when the number of weak learners was varied. Note that the cost parameter of SW-SVR was 1, σ of SW-SVR was 0.0 0 0 01, and training periods were 12 months. SW-SVR needs a longer build- ing time than RF and GB using shallow DT when the number of weak learners is lower. However, when the depth of DT becomes deeper or the number of weak learners becomes higher, SW-SVR can build the model faster than or at the same speed as RF and GB. Moreover, SW-SVR, as with RF, can be run easily in parallel environments, and it is expected that the building time of SW-SVR will become even shorter. Finally, Figs. 5 (c) and 6 (c) show the building time of the mod- els based on SVR when the parameters of SVR were varied. Note that the number of weak learners was 100, and the training peri- ods were 12 months. These results indicate that the building time of SW-SVR is significantly shorter than that of RBF-SVR but longer than linear SVR. Meanwhile, Fig. 4 demonstrates that weak learn- ers of SW-SVR are always built from a very small proportion of the whole training data. In particular, when prediction horizons were 1 h, the average of each extraction rate was 0.47 percent at best and 1.82 percent at worst. On the other hand, when prediction horizons were 6 h, the average of each extraction rate was 7.57 percent at best and 16.25 percent at worst. Nevertheless, the rea- son the computational complexity of SW-SVR is larger than linear SVR is that the increase of computational complexity due to build- ing several models is larger. However, since the amount of training data of each weak learner reduces substantially, the computational complexity to build one model in SW-SVR reduces also. Accord- ingly, when the number of models one CPU builds reduces by us- ing parallel processing, the computational complexity of the overall SW-SVR is lower than or equal to that of linear SVR. Meanwhile, as with linear SVR, SW-SVR never depends on the change of param- eters related to SVR, and the building time is always a constant. As mentioned in the above discussion, the building time of SW- SVR solely depends on the number of weak learners and training i eriods. Therefore, SW-SVR can avoid an unexpected long building ime in parameter tuning that changes each parameter variously. These results demonstrate that SW-SVR predicts complicated icrometeorological data with the best prediction performance nd the lowest computational complexity compared with standard lgorithms. In particular, we found that dynamic aggregation of odels built from very little extracted data by D-SDC is effective or compatibility of high prediction performance and low compu- ational complexity. However, there are problems to be solved in W-SVR. Firstly, the prediction performance of SW-SVR sometimes eteriorates despite an increase of training data. In particular, this roblem occurred under the conditions that prediction horizons re 6 h as shown in Fig. 3 . This is because data extracted by D- DC involves unnecessary training data for highly accurate predic- ion. If D-SDC extracts the same data as the extracted data when raining periods are shorter, the prediction performance of SW-SVR ever deteriorates due to an increase of training data. Therefore, e must review both feature mapping and algorithms of D-SDC o as to avoid extracting unnecessary training data. Meanwhile, W-SVR is based on a combination of several algorithms: kernel pproximation, PLS regression, k-means, D-SDC, and linear SVR. oreover, each algorithm has several parameters. Therefore, SW- VR has more varied parameters, and it takes more time to tune he parameters. In this experiment, we used a grid search roughly o as to decide the parameters in a certain time. However, there is till room for improvement in the prediction performance by us- ng other approaches such as a genetic algorithm instead of a grid earch ( Huang & Wang, 2006 ). . Conclusion and future work In this paper, we proposed a new methodology for predicting icrometeorological data, SW-SVR that involves a novel combina- ion of SVR and ensemble learning. To take the advantages of SVR nd ensemble learning, SW-SVR builds several SVRs specialized for ach representative data group in various natural environments by sing D-SDC that extracts effective training data for specific data rediction. Moreover, to follow micrometeorological data whose haracteristics always change with time, prediction of SW-SVR is ased on dynamically weighted ensemble learning depending on he similarity between test data and each data specialized by weak earners. As a result of evaluation experiments using large-scale icrometeorological data, the prediction performance of SW-SVR s greater than or equal to other general methods such as SVR, RF, Y. Kaneda, H. Mineno / Expert Systems With Applications 59 (2016) 217–225 225 a c f f m u w F v t c m e S A K R A B C C C F F F H J K K L M M M O P P P R S S S S T T U V W X nd GB. Moreover, SW-SVR reduces the building time substantially ompared with complicated models that have high prediction per- ormance. We anticipate that dynamic aggregation of models built rom various kinds of extracted data by D-SDC can contribute to ore sophisticated studies of micrometeorological data prediction. In future work, we should evaluate SW-SVR in more varied sit- ations to show that SW-SVR works effectively. In particular, we ill use more complicated data that consists of many features. urthermore, when SW-SVR is applied to applications such as en- ironmental control systems, the performance of overall applica- ions should be evaluated. Currently, we have developed an agri- ultural support system using SW-SVR, which controls environ- ents in greenhouses depending on the activity of the plants. The valuation of the applications will describe the superiority of SW- VR in practical use. cknowledgements This study was partially supported by JST, PRESTO , and JSPS AKENHI ( 26 6 60198 ), Japan. eferences ntonanzas, J. , Urraca, R. , Martinez-de-Pison, F. J. , & Antonanzas-Torres, F. (2015). Solar irradiation mapping with exogenous data from support vector regression machines estimations. Energy Conversion and Management, 100 , 380–390 . reiman, L. (2001). Random forests. Machine Learning, 45 (1), 5–32 http://doi.org/10. 1023/A:1010933404324 . ao, H., Naito, T., & Ninomiya, Y. (2008). Approximate RBF kernel SVM and its ap- plications in pedestrian classification. The 1st International Workshop on Ma- chine Learning for Visionbased Motion Analysis - MLVMA’08 , 1–9 http://hal. archives- ouvertes.fr/inria- 00325810/ . hang, C., & Lin, C. (2011). LIBSVM : A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2 , 1–39 http://doi.org/ 10.1145/1961189.1961199 . hevalier, R. F. , Hoogenboom, G. , McClendon, R. W. , & Paz, J. A. (2011). Support vec- tor regression with reduced training sets for air temperature prediction: A com- parison with artificial neural networks. Neural Computing & Applications, 20 (1), 151–159 Retrieved from < Go to ISI>://WOS:0 0 028667480 0 015 . an, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., & Lin, C.-J. (2008). LIBLINEAR: A library for large linear classification. The Journal of Machine Learning, 9 (2008), 1871–1874 http://doi.org/10.1038/oby.2011.351 . reund, Y., & Schapire, R. (1997). A desicion-theoretic generalization of on-line learning and an application to boosting. Computational Learning Theory, 55 (1), 119–139 http://doi.org/10.1006/jcss.1997.1504 . riedman, J. H. (2001). Greedy function approximation: A gradient boosting ma- chine. Annals of Statistics, 29 (5), 1189–1232 . uang, C. L., & Wang, C. J. (2006). A GA-based feature selection and parameters optimizationfor support vector machines. Expert Systems with Applications, 31 (2), 231–240 http://doi.org/10.1016/j.eswa.2005.09.024 . apan Meteorological Agency. (n.d.).. Japan meteorological agency http://www.jma.go. jp/jma/indexe.html . isi, O., & Cimen, M. (2012). Precipitation forecasting by using wavelet-support vec- tor machine conjunction model. Engineering Applications of Artificial Intelligence, 25 (4), 783–792 http://doi.org/10.1016/j.engappai.2011.11.003 . olokotsa, D., Pouliezos, A., Stavrakakis, G., & Lazos, C. (2009). Predictive con- trol techniques for energy and indoor environmental quality management in buildings. Building and Environment, 44 (9), 1850–1863 http://doi.org/10.1016/j. buildenv.20 08.12.0 07 . oosli, G. (2007). Comments on the core vector machines : fast SVM training on very large data sets. The Journal of Machine Learning Research, 8 , 291–301 . acqueen, J. (1967). Some methods for classification and analysis of multivari- ate observations. In Proceedings of the fifth berkeley symposium on mathemati- cal statistics and probability: 1 (pp. 281–297). http://doi.org/citeulike- article- id: 6083430 . aity, R. , Bhagwat, P. , & Bhatnagar, A. (2010). Potential of support vector regression for prediction of monthly streamflow using endogenous property. Hydrological Processes, 24 (7), 917–923 . ohammadi, K. , Shamshirband, S. , Anisi, M. H. , Alam, K. A. , & Petkovi ́c, D. (2015). Support vector regression based prediction of global solar radiation on a hori- zontal surface. Energy Conversion and Management, 91 , 433–441 . thman, M. F., & Shazali, K. (2012). Wireless sensor network applications: A study in environment monitoring system. In Procedia Engineering: 41 (pp. 1204–1210). http://doi.org/10.1016/j.proeng.2012.07.302 . ark, D. H., & Park, J. W. (2011). Wireless sensor network-based greenhouse envi- ronment monitoring and automatic control system for dew condensation pre- vention. Sensors, 11 (4), 3640–3651 http://doi.org/10.3390/s110403640 . edregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Courna- peau, D., Brucher, M., Perrot, M. & Duchesnay, É. (2011). Scikit-learn: machine learning in python. The Journal of Machine Learning Research 12, 2825–2830. http://doi.org/10.1007/s13398- 014- 0173- 7.2 latt, J. C. (1998). Fast training of support vector machines using sequential minimal optimization. Advances in Kernel Methods , 185–208 http://doi.org/10.1109/ISKE. 2008.4731075 . ahimi, A., & Recht, B. (2007). Random features for large-scale kernel machines. Advances in Neural Information Processing Systems, 20 , 1177–1184 http://doi.org/ 10.1.1.145.8736 . ingh, K. P., Gupta, S., & Rai, P. (2013). Identifying pollution sources and predicting urban air quality using ensemble learning methods. Atmospheric Environment, 80 , 426–437 http://doi.org/10.1016/j.atmosenv.2013.08.023 . mith, B. A., Hoogenboom, G., & McClendon, R. W. (2009). Artificial neural networks for automated year-round temperature prediction. Computers and Electronics in Agriculture, 68 (1), 52–61 http://doi.org/10.1016/j.compag.20 09.04.0 03 . uzuki, Y. , Kaneda, Y. , & Mineno, H. (2014). SW-SVR improved by short-distance data collection method (pp. 1–8) IPSJ SIG Technical Report, 2014-MBL-73(9) . uzuki, Y., Kaneda, Y., & Mineno, H. (2015). Analysis of support vector regression model for micrometeorological data prediction. Computer Science and Informa- tion Technology, 3 (2), 37–48 http://doi.org/10.13189/csit.2015.030202 . enenhaus, M., Vinzi, V. E., Chatelin, Y. M., & Lauro, C. (2005). PLS path modeling. Computational Statistics and Data Analysis, 48 (1), 159–205 http://doi.org/10.1016/ j.csda.20 04.03.0 05 . sang, I. W., Kwok, J. T., & Cheung, P.-M. (2005). Core vector machines: Fast SVM training on very large data sets. Journal of Machine Learning Research, 6 , 363– 392 http://doi.org/10.1111/j.1442- 9993.2007.01810.x . rraca, R. , Antonanzas, J. , Martinez-de-Pison, F. J. , & Antonanzas-Torres, F. (2015). Estimation of solar global irradiation in remote areas. Journal of Renewable and Sustainable Energy, 7 (2), 023136 . apnik, V. N. (1995). The Nature of Statistical Learning Theory : Vol. 8. Springer http: //doi.org/10.1109/TNN.1997.641482 . ang, B. X., & Japkowicz, N. (2009). Boosting support vector machines for imbal- anced data sets. Knowledge and Information Systems, 25 (1), 1–20 http://doi.org/ 10.1007/s10115- 009- 0198- y . ie, Y., Li, X., Ngai, E. W. T., & Ying, W. (2009). Customer churn prediction using improved balanced random forests. Expert Systems with Applications, 36 (3 PART 1), 5445–5449 http://doi.org/10.1016/j.eswa.2008.06.121 . http://dx.doi.org/10.13039/100005197 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0001 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0001 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0001 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0001 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0001 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0001 http://doi.org/10.1023/A:1010933404324 http://hal.archives-ouvertes.fr/inria-00325810/ http://doi.org/10.1145/1961189.1961199 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0005 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0005 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0005 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0005 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0005 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0005 http://doi.org/10.1038/oby.2011.351 http://doi.org/10.1006/jcss.1997.1504 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0008 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0008 http://doi.org/10.1016/j.eswa.2005.09.024 http://www.jma.go.jp/jma/indexe.html http://doi.org/10.1016/j.engappai.2011.11.003 http://doi.org/10.1016/j.buildenv.2008.12.007 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0012a http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0012a http://doi.org/citeulike-article-id:6083430 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0014 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0014 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0014 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0014 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0014 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0015 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0015 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0015 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0015 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0015 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0015 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0015 http://doi.org/10.1016/j.proeng.2012.07.302 http://doi.org/10.3390/s110403640 http://doi.org/10.1007/s13398-014-0173-7.2 http://doi.org/10.1109/ISKE.2008.4731075 http://doi.org/10.1.1.145.8736 http://doi.org/10.1016/j.atmosenv.2013.08.023 http://doi.org/10.1016/j.compag.2009.04.003 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0022 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0022 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0022 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0022 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0022 http://doi.org/10.13189/csit.2015.030202 http://doi.org/10.1016/j.csda.2004.03.005 http://doi.org/10.1111/j.1442-9993.2007.01810.x http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0026 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0026 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0026 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0026 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0026 http://refhub.elsevier.com/S0957-4174(16)30178-6/sbref0026 http://doi.org/10.1109/TNN.1997.641482 http://doi.org/10.1007/s10115-009-0198-y http://doi.org/10.1016/j.eswa.2008.06.121 Sliding window-based support vector regression for predicting micrometeorological data 1 Introduction 2 Related work 2.1 Support vector regression 2.2 Ensemble learning 3 SW-SVR: Sliding window-based support vector regression 4 Evaluation 4.1 Experiment 4.2 Results and discussion 5 Conclusion and future work Acknowledgements References