This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright http://www.elsevier.com/copyright Author's personal copy Ensemble based sensing anomaly detection in wireless sensor networks Daniel-Ioan Curiac ⇑, Constantin Volosencu Automation and Applied Informatics Department, ‘‘Politehnica’’ University of Timisoara, Bd. V. Parvan nr. 2, 300223 Timisoara, Romania a r t i c l e i n f o Keywords: Wireless sensor networks Binary classifier Ensemble based systems Sensors anomalies Data accuracy a b s t r a c t Wireless sensor networks are often used to monitor and measure physical characteristics from remote and sometimes hostile environments. In these circumstances the sensing data accuracy is a crucial attri- bute for the way these applications complete their objectives, requiring efficient solutions to discover sensor anomalies. Such solutions are hard to be found mainly because the intricate defining of the correct sensor behavior in a complex and dynamic environment. This paper tackles the sensing anomaly detec- tion from a new perspective by modeling the correct operation of sensors not by one, but by five different dynamical models, acting synergically to provide a reliable solution. Our methodology relies on an ensemble based system composed of a set of diverse binary classifiers, adequately selected, to implement a complex decisional system on network base station. Moreover, every time a sensing anomaly is discov- ered, our ensemble offers a reliable estimation to replace the erroneous measurement provided by sensor. � 2012 Elsevier Ltd. All rights reserved. 1. Introduction A wireless sensor network (WSN) is a collection of tiny, inex- pensive and low-power devices that can be deployed throughout a geographical space for fine-grained monitoring and event detec- tion. Besides its computing and communication potential, a WSN is, first of all, an advanced distributed measurement system which is often prone to sensing anomalies that can cause erroneous data, compromising the objectives of the entire network. There are three major sources of sensor anomalies within WSN: (i) software or/and hardware failures – breakdowns in any subsys- tem of a sensor node or even battery discharging can produce wrong sensing data; (ii) security attacks – when a malicious entity compels the sensors to report erroneous measurements or to drop measurement data packages (Walters, Liang, Shi, & Chaudhary, 2006); (iii) environment related sources – when sensor nodes can- not measure the physical value correctly due to unfavorable phe- nomena that can arise in the harsh environments in which they are often deployed. Generally speaking, sensing anomaly detection refers to the problem of discovering patterns in measurement data that do not match with expected behavior (Chandola, Banerjee, & Kumar, 2009; Rajasegarar, Leckie, & Palansiwami, 2008; An, Heo, & Chang, 2011). This is not a simple task mainly because an estimated model of ‘‘correct behavior’’ is always hard to find. In the case of a WSN there is an inherent feature on which we can rely – sensing redun- dancy (Curiac, Volosencu, Pescaru, Jurca, & Doboli, 2009), which takes two basic forms: physical redundancy that implies the use of more than one sensor node for measuring the same localized va- lue; and analytical redundancy that implies a mathematical model for evaluating the value provided by one sensor by taking into con- sideration the past and present values of neighboring sensors (spa- tial redundancy), the past values of the sensor itself (temporal redundancy) or both (spatiotemporal redundancy). In the last decade a series of relevant approaches based on assortments employing different types of analytical redundancies and intelligent detection algorithms have been proposed for solv- ing the issue of sensing anomaly discovery. In (Siripanadorn, Hattagam, & Teaumroong, 2010) a mixture between a competitive learning method called the self-organizing map (SOM) and the discrete wavelet transform (DWT) is used to detect anomalies from synthetic and real-world datasets. A two-step temporal modeling procedure, developed to analyze and extract semantic symbols from a sequence of observations, is presented in Li, Thomason, and Parker (2010) where an intelligent system detects time-related changes online by using a likelihood- ratio detection scheme. The algorithm is distributed, and supports a hierarchical learning structure that can scale to large number of sensors. The use of Bayesian networks as means for unsupervised learn- ing and anomaly detection in gas monitoring sensor networks for underground coal mines is described in Wang, Lizier, Obst, Prokopenko, and Wang (2008). The authors showed that the Bayesian network model can learn cyclical baselines for gas con- centrations, thus reducing false alarms usually caused by flatline thresholds. Their solution was proved to be efficient in both distributed and centralized approach. 0957-4174/$ - see front matter � 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2012.02.036 ⇑ Corresponding author. Tel.: +40 256 403 227; fax: +40 256 403 214. E-mail addresses: daniel.curiac@aut.upt.ro (D.-I. Curiac), constantin.volosen- cu@aut.upt.ro (C. Volosencu). Expert Systems with Applications 39 (2012) 9087–9096 Contents lists available at SciVerse ScienceDirect Expert Systems with Applications j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e s w a Author's personal copy An efficient method applying principal component analysis (PCA) simultaneously on multiple metrics received from various sensors is depicted in Chatzigiannakis and Papavassiliou (2007). One of the key features of this approach is that it provides an inte- grated methodology of taking into consideration and combining effectively correlated sensor data, in a distributed fashion. Further- more, it allows the integration of results from neighboring network areas to detect correlated anomalies that involve multiple groups of nodes. Our paper tackles the sensing anomaly discovery from a new perspective: the modeling of the correct behavior for sensors is done not by one, but by five different models, acting synergically to provide a reliable solution. For this, we developed an ensemble based system (EBS) containing five different binary classifiers, each categorizing every network node as being accurate or erroneous, the final decision being taken by the entire ensemble using a voting procedure. It is broadly accepted that the overall efficiency of such committees of classifiers can occur only if there is diversity among its components (Polikar, 2006). In our proposal, the heterogeneity of classifiers is achieved by using various and carefully selected classifier architectures and different sets of input data. In order to completely solve the problem, our ensemble not only discovers sensing errors, but offers reliable estimations to replace the mea- surements affected by these anomalies. The remainder of the paper is organized as follows. In Section 2, we introduce the philosophy of our ensemble based sensing anomaly detection. Section 3 describes the architecture of each individual classifier, pointing out the way in which the correct sensing behavior is modeled and predicted, while Section 4 depicts the decisional block of the ensemble based on weighted voting algorithm. In Section 5, we present the methodology used for train- ing and testing of the ensemble. Section 6 covers a test case illus- trating the entire methodology and, finally, conclusions are offered in Section 7. 2. Ensemble based sensing anomaly detection Discovering sensing anomalies in the context of WSN is a chal- lenging issue due to the complexity of the environment in which sensor nodes are deployed. Often, this subject is tackled using ded- icated decisional systems. For acquiring node behavior related decisions, it makes sense to ask more than one decision making en- tity, because this practice assures indubitably a better, more in- formed, and trustable final decision. We label these decisional instances as classifiers or experts and their collections as ensemble based systems (Dietterich, 2000; Ho, Hull, & Srihari, 1994; Polikar, 2006; Wang, Hao, Ma, & Jiang, 2011). In order to periodically detect and investigate each and every sensor anomaly, an ensemble based system has been designed. As presented in Fig. 1, this ensemble encloses several binary classifiers that independently categorize the state of each sensor as ‘‘reliable’’ or ‘‘unreliable’’. All the classifier outputs will be aggre- gated using a weighted majority algorithm to obtain the conclud- ing ensemble decision. This final ensemble decision will be further used by the base station to take all the required actions for the unreliable nodes (e.g. removal of the untrustworthy nodes from the WSN for a specified period of time). Our ensemble based methodology reveals the following characteristics: – The EBS input data are represented by past and present mea- surements gathered by the node under investigation (node A) and respectively, past and present measurements gathered by each of the node A neighboring nodes. – There are five binary classifiers involving different prediction techniques and different types of inputs to assure the required diversity of classifiers. Each classifier offers its own hypothesis hi (‘‘reliable’’ or ‘‘unreliable’’) about the sensing correctness of the node under investigation. – The overall ensemble decision block is based on a weighted majority algorithm. – If the investigated node is found as presenting sensor anoma- lies, the network base station acts in consequence and can exclude that specific sensor from the list of network functioning sensors for a limited period of time. As an example, this can be achieved based on the following rule: if the EBS indicated at least three times that the node A suffers from a sensor anomaly, the base station decides to inactivate the sensor. The base sta- tion could later reactivate the sensor after repeating the EBS investigation by testing if new readings became appropriate. – Using the power of ensemble, a reliable estimation of the mea- surement affected by anomaly is automatically offered any time when needed. In the following paragraphs the design of each EBS component, together with details regarding training and testing of the ensem- ble are offered. 3. Designing the classifiers The design of individual classifiers to fulfill the EBS require- ments is not a simple task. This process has to be governed by one magic word: diversity. As a result, any stratagem for generat- ing the ensemble members must be focused on the ensemble’s het- erogeneity improvement. The diversity of classifiers may originate from three basic sources: a. classifier structure: the use of diverse classifier algorithms (Hsu & Srivastava, 2009; Tsoumakas, Katakis, & Vlahavas, 2004) can assure the required heterogeneity (diversity through structure); b. classifier internal parameters: by using different training datasets (García-Pedrajas & Ortiz-Boyer, 2009; Kim & Kang, 2010; Li & Sun, 2012; Shirai, Kudo, & Nakamura, 2009) or diverse initializations of the training algorithms, the classifi- ers having exactly the same structure may cover different regions of the classified workspace, assuring the heterogene- ity (diversity through parameters); and c. classifier inputs: the diversity may be assured by applying different inputs to identical classifiers (diversity through inputs); usually this channel to obtain diversity is irrelevant, but when there is an overabundance of data sources it can be a viable alternative.Fig. 1. Ensemble based sensing anomaly detection. 9088 D.-I. Curiac, C. Volosencu / Expert Systems with Applications 39 (2012) 9087–9096 Author's personal copy There are circumstances in which these three possibilities are combined resulting more complex ensembles, but applying met- rics to evaluate diversity in these cases may become very difficult. Due to the specificity of WSN applications we chose to assure the diversity of classifiers using a mixture between diversity through structure and diversity through inputs. From this perspec- tive we developed five classifiers that combine the two kinds of in- put data (measurement time series provided by the sensor under investigation and the measurements gathered from neighboring sensors), with linear or complex nonlinear models: – average based classifier; – autoregressive linear predictor based classifier; – neural network based classifier; – neural network autoregressive predictor based classifier; – Adaptive Neuro-Fuzzy Inference System (ANFIS) based classifier. All these classifiers are built based on a preset configuration that includes: � a prediction block – this block encapsulates a linear or nonlinear model of the ‘‘correct behavior’’ of the sensor and relies on the analytical redundancy feature of the WSNs. Manipulating mea- surement data often affected by errors gathered from a dynamic and complex environment, the predictor block parameters have to be very carefully chosen. By choosing different types of pre- diction algorithms, the required diversity can be assured. � an error computing block that compares the sensor current measurement with its predicted value; Practically, this block compares the actual behavior of the sensor under investigation with a modeled ‘‘correct behavior’’. � an internal decision block that takes the individual resolution of the classifier – correct or incorrect behavior, using a carefully chosen threshold as the limit between good and bad behavior. 3.1. Average based classifier The average based classifier (AVC) is computationally the sim- plest binary classifier used in our ensemble. Based on measure- ment gathered by the sensor under investigation at a specific moment in time hA(t) and on the measurements provided by k adjacent sensors at the same moment in time aggregated in the in- put vector hAVC(t), it provides a binary output: ‘‘0’’ meaning normal functioning of sensor A and ‘‘1’’ meaning abnormal functioning. hAVCðtÞ¼ ½h1ðtÞ h2ðtÞ . . . hkðtÞ�: ð1Þ This classifier exploits the analytical spatial redundancy feature of WSN that allows the estimation of the measurement gathered from sensor A using the measurement values provided by neigh- boring sensors. In this case, the estimation will be obtained using a simple averaging scheme. As can be seen in Fig. 2, the classifier’s internal structure consists of one average computation block, one error computation block and one internal decisional block. Classifier’s average computation block receives all present mea- surement values of each of the k neighbors and computes the aver- aged value as presented in (2): hA;AVCðtÞ¼ 1 k Xk i¼1 hiðtÞ: ð2Þ The average value hA,AVC(t) is subtracted from node’s current mea- surement value, resulting the error eAVC(t): eAVCðtÞ¼ hAðtÞ� hA;AVCðtÞ: ð3Þ The obtained error value is then evaluated by classifier’s deci- sional block for producing hAVC(t) hypothesis. This evaluation is performed by comparing the error value with a given threshold eAVC. If the absolute value of the error exceeds eAVC, the classifier outputs a hypothesis equal to ‘‘1’’, meaning that the measurement provided by the node A was classified as being erroneous: hAVCðtÞ¼ 0 if jeAVCðtÞj < eAVC 1 if jeAVCðtÞj P eAVC � : ð4Þ In the training process, there is only one AVC parameter that has to be established: the threshold eAVC. This is done based on tri- als, experience and intuition of the human expert which plays a central role. 3.2. Autoregressive linear predictor based classifier The autoregressive linear predictor based classifier (ALC) exploits the analytical temporal redundancy feature of WSN that allows the estimation of the measurement gathered from a specific sensor A using past measurements of the same sensor to supply a binary output: ‘‘0’’ meaning normal functioning of the sensor and ‘‘1’’ meaning abnormal functioning. In this case, the estimation will be obtained using an autore- gressive predictor based on a numerically robust variant of recur- sive least square (RLS) method Ljung & Soderstrom, 1987; Yang & Bohme, 1992. As can be seen in Fig. 3, the classifier’s internal struc- ture consists of one autoregressive prediction block, one error cal- culation block and one internal decisional block. Classifier’s autoregressive prediction block receives past mea- surement values provided by the sensor under investigation: hALCðtÞ¼ ½hAðt � 1Þ � � �hAðt � nÞ� ð5Þ and predicts its current value as shown in (6), where n is the auto- regression order, the parameters a1(t), a2(t), . . . an(t) are estimated using a robust recursive least square algorithm, and n(t) is consid- ered to be a white-noise perturbation hA;ALCðtÞ¼ a1ðtÞ � hAðt � 1Þþ �� �þ anðtÞ � hAðt � nÞþ nðtÞ: ð6Þ The predicted value hA,ALC(t) is subtracted from node’s current mea- surement value, obtaining the error eALC(t) as shown in (7): eALCðtÞ¼ hAðtÞ� hA;ALCðtÞ: ð7Þ The obtained error value is then evaluated by classifier’s internal decisional block for producing hALC(t) hypothesis in a similar man- ner as the one described by (4). If the error value exceeds the value Fig. 2. Average based classifier. Fig. 3. Autoregressive linear predictor based classifier. D.-I. Curiac, C. Volosencu / Expert Systems with Applications 39 (2012) 9087–9096 9089 Author's personal copy of a given threshold eALC, the classifier outputs a hypothesis equal to ‘‘1’’, meaning that the measurement provided by the node A was classified as being erroneous. The classifier has some internal parameters that have to be thoroughly established: the threshold eALC, the autoregression order n, and the initialization parameters of the RLS algorithm. The threshold eALC is set based on trials, experience and intuition of the human expert, the autoregression order is a positive integer with values among 3 and 6 to make a reasonable balance between precision and computational time, and initialization parameters of RLS are set according to prescriptions when no a priori information are available (Ljung, 2007). An important remark upon the use of ALC: the classifier can be used as a part of EBS only after some time steps (e.g. 2n time sam- ples) because the prediction offered by RLS needs some time until the parameter convergence (Ljung, 1999). 3.3. Neural network based classifier The neural network based classifier (NNC) processes the current value provided by the sensor under investigation and the current/ past values gathered from adjacent sensors to provide a binary out- put: ‘‘0’’ meaning normal functioning of the sensor and ‘‘1’’ mean- ing abnormal functioning. This classifier exploits the analytical spatial redundancy feature of WSN that allows the estimation of the measurement gathered from a specific sensor A using only measurements provided by neighboring nodes. In this case, the estimation will be obtained using a neural network predictor. By using a strong nonlinear model (neural network) to approximate the ‘‘normal’’ operation state of a sensor we can model more sophisticated situations than the linear ones (ALC or AVC). As can be seen in Fig. 4, the classifier’s internal structure con- sists of one feed-forward neural network, one error calculation block and one internal decision block. The neural network, based on its input vector that contains current/past measurements pro- vided by its k neighbors: hNNCðtÞ¼ ½h1ðtÞ � � �h1ðt � nÞ � � �hkðtÞ � � �hkðt � nÞ�; ð8Þ predicts the actual value gathered from sensor A – hA,NNC(t). This prediction is subtracted from the current measurement of the sen- sor under investigation hA(t) to obtain the error: eNNCðtÞ¼ hAðtÞ� hA;NNCðtÞ; ð9Þ which will be used by the internal decision block to obtain the clas- sifier’s hypothesis hNNC(t). Classifier’s decisional block functions the same as the one included in the average based classifier but uses another threshold: eNNC. The classifier has some internal parameters that have to be carefully established: the threshold eNNC, the neural network struc- ture (number of layers, number of neurons on each layer, etc.) and the neural network’s weights. The threshold eNNC and the structure of the neural network are set based on trials, experience and intu- ition of the human expert. The neural network training can be done in an automated fashion using a backpropagation algorithm (e.g. Levenberg–Marquardt). To make a reasonable compromise between complexity and computing time, we chose to use only measurements provided at last three moments in time: t, t � 1 and t � 2. This classifier can be used as a component in our EBS only after n time samples, this period being necessary to populate all compo- nents of the input vector hNNC(t) with real values. 3.4. Neural network autoregressive predictor based classifier The neural network autoregressive predictor based classifier (NNAC) processes the values provided only by the sensor under investigation to provide a binary output: ‘‘0’’ meaning normal functioning of the sensor and ‘‘1’’ meaning abnormal functioning. In fact this is an alternative to the autoregressive predictor based classifier that uses a strong nonlinear model (neural network) instead of a linear model. This classifier exploits the analytical tem- poral redundancy feature of WSN that allows the estimation of the measurement gathered from a specific sensor A using past mea- surements of the same sensor. In this case, the autoregressive pre- diction process is done by a neural network predictor. As can be seen in Fig. 5, the classifier’s internal structure is al- most identical with the one presented for NNC, with the difference that the input vector includes only past measurements obtained from sensor A: hNNACðtÞ¼ ½hAðt � 1ÞhAðt � 2Þ � � �hAðt � nÞ�: ð10Þ This will impose some changes in the neural network structure (e.g. number of neurons on input layer). In the case of NNAC, the internal parameters that have to be cautiously established are: the threshold eNNAC, the neural network structure and the neural network’s weights. The threshold eNNAC and the neural network structure (number of layers, number of neurons on each hidden layer) are set based on trials, knowledge and intuition of the human expert. The neural network weights can be obtained using an automated procedure based on a back- propagation algorithm. To make a rational compromise between complexity and computing time, we decided to use only measure- ments gathered at last four moments in time: t, t � 1, t � 2 and t � 3. This classifier can be used as a component in our EBS only after n time samples, this period being necessary to populate all compo- nents of the input vector hNNAC(t) with real values. 3.5. ANFIS based classifier The ANFIS based classifier (ANFISC) processes the past values provided by the sensor under investigation and current/past values provided by adjacent sensors to provide a binary output: ‘‘0’’ meaning normal functioning of the sensor and ‘‘1’’ meaning abnor- mal functioning. This classifier exploits the analytical spatiotempo- ral redundancy feature of WSN that allows the estimation of the measurement gathered from a specific sensor A using past mea- surements of the same sensor and measurements provided by neighbor nodes. By using a strong nonlinear model (ANFIS) to Fig. 4. Neural network based classifier. Fig. 5. Neural network autoregressive predictor based classifier. 9090 D.-I. Curiac, C. Volosencu / Expert Systems with Applications 39 (2012) 9087–9096 Author's personal copy approximate the ‘‘normal’’ operation state of a sensor we can mod- el very sophisticated situations. As can be seen in Fig. 6, the classifier’s internal structure con- sists of one ANFIS prediction block, one error calculation block and one internal decisional block. Adaptive Neuro-Fuzzy Inference System (ANFIS) was developed by Jang (1991, 1993) and is a hybrid neuro-fuzzy system that uses the Takagi–Sugeno-type fuzzy inference. Basically, ANFIS is a fuzzy inference system, framed in a feed-forward neural network. Hence, the advantages of a fuzzy system can be combined with a learning algorithm. ANFIS architecture consists of five layers of nodes: the first and the fourth layers consist of adaptive nodes implementing fuzzification and defuzzification (represented by squares in the ANFIS structure from Fig. 6), while the second, third and fifth layers consist of fixed nodes that are implementing the fuzzy rule (P), the normalization (N) and summation ( P ). The adaptive nodes are associated with their respective parameters, get duly updated with each subsequent iteration while the fixed nodes are devoid of any parameters. By including two intelligent approaches (neural networks and fuzzy reasoning), the ANFIS block can achieve good reasoning in quality and quantity. In other words we have fuzzy reasoning and network calculation. The ANFIS predictor, based on its input vector hANFISC(t) that contains past measurements obtained from sensor A and current/ past measurements provided by its k neighbors: hANFISCðtÞ¼ hAðt�1Þ���hAðt�nÞjh1ðtÞ���h1ðt�nÞj. . .jhkðtÞ���hkðt�nÞb c ð11Þ predicts the actual value gathered from sensor A – hA,ANFISC(t). This prediction is subtracted from the current measurement of the sen- sor under investigation hA(t) to obtain the error: eANFISCðtÞ¼ hAðtÞ� hA;ANFISCðtÞ ð12Þ which will be exploited by the internal decision block to achieve the classifier’s hypothesis hANFISC(t). Classifier’s decisional block oper- ates the same as the ones included in above mentioned classifiers (AVC, ALC, NNC and NNAC), but utilizes a different threshold: eANFISC. The classifier has some internal parameters that had to be care- fully determined: the threshold eANFISC and the ANFIS structure and parameters. The threshold eANFISC is set based on trials, knowledge and intuition of the human expert, while the ANFIS network parameters are obtained using a combination of the least-squares method and the backpropagation gradient descent method (Math- Works, 2002). The training process of this classifier is very computationally intensive. To make a reasonable compromise between model com- plexity and computing time, we chose to use only two measure- ments from each sensor: hANFISCðtÞ¼½hAðt�1ÞhAðt�2Þh1ðtÞh1ðt�1Þ. . .hkðtÞhkðt�1Þ�: ð13Þ The ANFIS based classifier can be used as a part of our EBS only after n time samples, this period being necessary to populate all compo- nents of the input vector hNNC(t) with real measurements. 4. Decision block The ensemble decision block is in charge of two tasks: (a) to provide a confident decision of ‘‘reliable’’ or ‘‘unreliable’’ regarding each and every sensor measurement; and, (b) to offer a trustwor- thy estimation of the measurement value when a value provided by the sensor is considered to be affected by an anomaly. Taking the ensemble final decision can be efficiently imple- mented using the weighted majority voting algorithm (Littlestone & Warmuth, 1994). The five classifiers provide a set of hypothesis – hAVC(t), hALC(t), hNNC(t), hNNAC(t), hANFISC(t) e {0, 1} – that are later aggregated using a weighted majority voting scheme formularized by the following equations: WMVðtÞ¼ X i wi � hiðtÞ ð14Þ with hi(t) e {hAVC(t), hALC(t), hNNC(t), hNNAC(t), hANFISC(t)} and P iwi = 1. The overall ensemble decision is computed using the following relation: hensemble ¼ roundðWMVðtÞÞ¼ 0 if WMVðtÞ < 0:5 1 if WMVðtÞ P 0:5 � ; ð15Þ where the function round(X) rounds X to the nearest integer. The weights wi related to each hypothesis are chosen based on simulations and experiments. In order to provide equal rights for components based on temporal redundancy and spatial redun- dancy we included the following restriction: wALC þwNNAC þ0:5�wANFISC ¼wAVC þwNNC þ0:5�wANFISC ¼0:5: ð16Þ In case a measurement provided by the sensor under investiga- tion is considered abnormal, the ensemble decision block offers a reliable estimation that can replace it. This value is obtained by averaging the estimated values of the measurement provided by individual classifiers: hA;trustðtÞ¼ 1 5 � ½hA;AVCðtÞþ hA;ALCðtÞþ hA;NNCðtÞþ hA;NNACðtÞ þ hA;ANFISCðtÞ�: ð17Þ In order to prevent incorrect classifying of the ensemble compo- nents based on autoregressive forecasts (ALC, NNAC and ANFISC), their input values hA(t � i), when affected by anomalies, have to be replaced by these trustworthy estimations hA,trust(t � i). The rea- son is simple: wrong data in inputs conduct to misclassification. 5. Training and testing the ensemble After setting the structure for each individual classifier and for the decision block, a complex training and testing process must be carried out to assure optimal sets of parameters (Fig. 7). This is done using a variety of automated/nonautomated procedures individualized for each ensemble component and relies on training datasets obtained from two sources: computer simulation and experimental deployments of the same type of WSNs. The training process is undoubtedly one of the most challenging tasks when designing an ensemble system. There are three speci- ficities that must be carefully taken into account in the training process of the proposed ensemble: (a) The impact of the sensor deployment: Three of the classifiers included in our ensemble use measurements provided by neighboring sensors. This characteristic implies that the sen- sor deployment (shape, distances between adjacent nodes, etc.) is a key aspect, affecting decisively the training datasets and, by this, the entire training process. Fig. 6. ANFIS based classifier. D.-I. Curiac, C. Volosencu / Expert Systems with Applications 39 (2012) 9087–9096 9091 Author's personal copy (b) The way training datasets can be obtained: The environment in which the WSN will be deployed is extremely complex and, as a result, hard to simulate and predict. The training datasets will have to approximate this intricate environment which cannot be done using only computer simulation or only experimental deployments. A mixture between the two possibilities is more feasible. (c) Assuring the classifiers diversity is a must: The training process must guarantee the diversity of classifiers as mentioned in Section 3. The diversity is estimated using proper metrics that are computed during validation of the training process. If the diversity requirements are not met the training must be redone. The training and testing methodology applied in the case of our ensemble is governed by the following steps: i. Choosing the sensor deployment shapes that will be considered: Due to the specificity of WSN deployment procedures (aerial scattering, manual deployment, etc.) and due to possible existence of non-reporting sensors, the geographical place- ment of the sensors in the field can conduct to a plethora of configurations. From this huge number, we must select a reasonable set that can approximately cover all the situa- tions. For each selected sensor deployment configuration, a set of parameters for ensemble components will be obtained through training. ii. Forming the training and testing datasets for each sensor deployment shape: In order to obtain consistent training and testing datasets, we must concentrate on two direc- tions: development of computer simulation programs that can model fairly accurate the environment; and, deployment of experimental WSNs to collect relevant data from real environments. iii. Training the ensemble components: all the parameters of each and every ensemble component will be obtained using auto- mated (Levenberg–Marquardt algorithm in case of neural networks, a combination of the least-squares method with the backpropagation gradient descent method in case of ANFIS) and nonautomated techniques. Obtaining the thresh- olds for each classifier or the weights for the decision block is based mainly on experience and intuition. iv. Testing the trained ensemble using testing datasets and ensem- ble evaluation based on proper diversity metrics: Each trained ensemble is validated using independent testing datasets (data not used in ensemble training) obtained through com- puter simulations or using experimental sensor deploy- ments in a real environment. In addition, the diversity of the classifiers inside each of the ensembles is computed using proper metrics. v. If the testing results do not match with the desired results or if the diversity of the classifiers is not appropriate, the training process will be redone. Internal parameters of the individual classifiers can be obtained using a variety of techniques as follows: the thresholds eAVC, eALC, eNNC, eNNAC and eANFISC are tuned based on trials, experience and intuition, the weights of the two neural networks implied in NNC and NNAC classifiers are computed using Levenberg–Marquardt algorithm and the ANFIS parameters can be calculated using a combination of the least-squares method with the backpropaga- tion gradient descent method. A good choice for implementing the training process is proved to be Matlab/Symulink package that offers powerful functions for simulation, neural networks and AN- FIS training, etc. After training, each ensemble is subject to a testing and valida- tion procedure, where new data are classified and diversity metrics (Q-statistic) are computed. If the results of this validation are not acceptable the training process must be performed once again. 5.1. The link between the training process and the sensor deployment in the field The training process depends decisively on the real sensor deployment. Because the ensemble contains classifiers that rely on the measurements provided by adjacent sensors, the in-field position of the sensors has a major impact in the way the ensemble has to be trained. In order to standardize the training procedure we chose to train the ensemble for a set of nine different deployment shapes, pre- sented in Fig. 8. This set covers a larger variety of deployments be- cause it envelops the shapes obtained through rotation or/and symmetrization. Basically, we have to obtain the parameters for each ensemble component corresponding to this set of deployments and to store them in a small Access-type database or in a text file that will be parsed any time when needed, on the base station. When the deployment of WSN in the field is finished, for each sensor node, a procedure to find the most appropriate shape will be started. This can be done using shape matching procedures sim- ilar to the ones presented in Cao, Lisani, Morel, Musé, and Sur (2008), Belongie, Malik, and Puzicha (2002), Huttenlocher, Klanderman, and Rucklidge (1993). When the closest shape of the deployment is obtained, the parameters for the related ensem- ble are loaded from the database. This way, the ensemble can work even when some sensors are not reporting or even when sensors are deployed nonuniformly (randomly) in the field. 5.2. Obtaining the training and testing datasets Sensor anomalies within WSNs are hard to be discovered mainly because the complexity of environment in which they are deployed. Our ensemble must envelope this complex environment in mathematical models contained in each classifier. These models have to reflect different facets of an intricate reality and, for this, they have to be trained using proper datasets obtained from two independent sources (Fig. 7): Fig. 7. Training and testing processes of the ensemble. 9092 D.-I. Curiac, C. Volosencu / Expert Systems with Applications 39 (2012) 9087–9096 Author's personal copy – Computer simulation: In principle, the environment in which the WSN will be deployed can be simulated to obtain relevant data for training and testing the classifiers, but this action is highly problematical due to the complexity of involved phenomena. For this reason, the computer simulations are error prone and cannot be the only source for training and testing datasets. – Experimental deployments: A relevant source of training and testing datasets is represented by experimental deployments of WSNs in similar environments. Although, the real environ- ment is hard to be replicated, data obtained from experimental deployments are relevant, but sometimes statistically inefficient. By combining information provided by these two sources, con- sistent training and testing datasets can be formed. 5.3. Metrics to estimate the diversity of classifiers In order to quantitatively evaluate the diversity of classifiers, several metrics have been defined (Kuncheva & Whitaker, 2003; Polikar, 2006). Among them, the pairwise metrics defined between two classifiers are the simplest. In our case, the ensemble incorpo- rates T = 5 classifiers, so there is a need to compute T(T � 1)/2 = 10 pairwise heterogeneity metrics, and then to obtain the overall diversity of the ensemble by averaging these pairwise metrics. Given the hypotheses hi and hj provided by classifiers i and j, we can define four probabilities (a, b, c and d) as presented in Table 1: a – the probability of the occurrence of a case in which both hi and hj hypotheses are correct, b – the probability of the occurrence of a case in which hi hypothesis is correct and hj hypothesis is incorrect, and so on. Evidently, a + b + c + d = 1. Based on these probabilities we can compute the Q-statistic, an indicator that is intuitive and simple to implement (Kuncheva & Whitaker, 2003): Q i;j ¼ ad � bc ad þ bc ; ð18Þ Q has positive values if the same instances are correctly classified by both classifiers and negative values, otherwise. Maximum diversity is obtained for Q = 0. The overall Q-statistic indicator of the ensemble will be com- puted using the following formula: Q ensemble ¼ 1 npairs X jQ i;jj ¼ 2 TðT � 1Þ X jQ i;jj; ð19Þ where T represents the number of classifiers included in the ensem- ble, and |Qi,j| represents the absolute value of Qi,j. Qensemble e [0, 1] and has to take values as close as possible to zero to meet the requirement of classifiers diversity. Otherwise, the trained ensem- ble cannot pass the testing process and must be retrained. 6. Implementation and case study For validating the above concept and related methodologies we performed a series of studies using a WSN designed for measuring the temperature in an indoor environment. Our experimental sen- sor network is composed of nine Crossbow-Iris nodes equipped with MTS310 sensors boards which report the measured values through a gateway (MIB520CB) to a laptop-class device where our software modules can efficiently operate. All the programs were developed in Matlab/Simulink mainly because of its strong capabilities within the fields of recursive parameter estimation, neural networks and ANFIS related tech- niques. Basically we developed two different programs, one related to training and testing process where the parameters of the EBS are tuned for fulfilling the expectations and one for on-line sensing anomalies discovery that implements the trained ensemble of classifiers. The datasets for training and testing were obtained from two different sources, experimental WSN deployment and computer simulation, to cover a wider variety of environmental heat-transfer related phenomena. Our indoor WSN deployment has the configuration depicted in Fig. 9, where the temperature can be intentionally perturbed using Fig. 8. Training deployment shapes. Table 1 Relationship between a pair of classifiers. hj is correct hj is incorrect hi is correct a b hi is incorrect c d Fig. 9. Experimental indoor WSN deployment. D.-I. Curiac, C. Volosencu / Expert Systems with Applications 39 (2012) 9087–9096 9093 Author's personal copy either mobile air conditioning units (affect groups of sensors with heating/cooling waves), either a heat lamp (affects a single sensor by increasing the temperature in its immediate vicinity). Computer simulation is another source of training/testing data- sets. We started from the heat equation which describes the distri- bution of heat (or variation in temperature) in a given region over time. For a function u(x, y, t) of two spatial variables (x, y) and the time variable t, the heat equation with peak p(x, y) as the source is as follows: Table 2 Configuration of the training and testing datasets. Package number Time (min) Node NW (�C) Node N (�C) Node NE (�C) Node E (�C) Node SE (�C) Node S (�C) Node SW (�C) Node W (�C) Node A (�C) . . . 3 86 23.32 23.28 23.12 23.31 23.31 23.07 23.26 23.21 23.18 4 0 18.32 18.28 18.27 18.17 18.30 18.27 18.29 18.27 18.19 4 1 18.74 18.42 18.41 18.35 18.29 18.32 18.39 18.52 18.30 4 2 19.32 19.03 18.70 18.61 18.32 18.48 19.01 19.23 18.69 . . . Table 3 Structure of the training and testing datasets. Package number Destination Number of time samples Number of abnormal functioning samples Source 1 Training 127 15 Real WSN 2 Training 226 32 Real WSN 3 Training 87 12 Real WSN 4 Training 320 63 Computer simulation 5 Training 342 45 Computer simulation 6 Training 287 42 Computer simulation 7 Testing 79 14 Real WSN 8 Testing 162 23 Real WSN 9 Testing 154 15 Computer simulation 10 Testing 303 41 Computer simulation Table 4 Pairwise Q-statistics between the five classifiers. Q-statistics AVC ALC NNC NNAC ANFISC AVC 1.00 0.82 0.86 0.84 0.86 ALC 0.82 1.00 0.79 0.91 0.89 NNC 0.86 0.79 1.00 0.82 0.90 NNAC 0.84 0.91 0.82 1.00 0.86 ANFISC 0.86 0.89 0.90 0.86 1.00 0 50 100 150 16 16.5 17 17.5 18 18.5 19 19.5 20 20.5 21 Time Te m pe ra tu re NW neighbor E neighbor S neighbor sensor A with anomalies Fig. 10. Temperature variations for node A and its NW, E and S neighbors. Table 5 Correct, anomalous and estimated values for the sensor A. Time (min) 10 20 40 60 70 71 72 90 100 105 120 130 140 145 150 Measurements for sensor A without anomalies (�C) 19.43 18.98 17.22 19.57 19.58 19.35 19.06 17.15 18.33 18.34 17.33 18.39 19.75 19.89 19.11 Measurements for sensor A with anomalies (�C) 18 20 16 21 18 20 17 18.5 17 19.5 16 19.5 18 21 17 Estimated values (�C) 19.51 18.80 17.29 19.47 19.40 19.17 18.87 17.34 18.23 18.39 17.22 18.39 19.71 19.74 18.96 9094 D.-I. Curiac, C. Volosencu / Expert Systems with Applications 39 (2012) 9087–9096 Author's personal copy @u @t ¼ a � @ 2u @x2 þ @ 2 u @y2 ! � pðx; yÞ: ð20Þ We simulated this equation in Matlab (PDEToolbox) with different parameters to model diverse dynamics of the heat in the 2D envi- ronment where the sensor nodes are placed. In order to introduce erroneous values, we changed deliberately some of the temperature values near the node under investigation, at random moments in time. For our case study, we considered all the weights of the EBS decision block to be 0.2 and the thresholds of the individual classi- fiers to be 0.35 �C. Due to the way abnormal activity is discovered in our method- ology, the datasets are not composed of independent values, but of packages of values obtained for different consecutive moments in time to catch the dynamics of the environment. For this reason the datasets for training and testing are structured as in Table 2 (neighboring nodes of the sensor A are individualized using their relative position described by cardinal and intercardinal direc- tions: e.g. NorthWest – NW, East – E, etc.), where a package is rep- resented by a number of measurements reported at successive moments in time. The training datasets included six packages, half of them ob- tained through computer simulation and half obtained using the experimental WSN deployment. The testing datasets consisted of only four packages, two obtained from the real deployment and two through computer simulation (Table 3). In the training process we imposed that the Q-statistics be- tween every two classifiers to be less then 0.91 and the overall Q-statistics obtained using Eq. (19) to be under 0.87. These goals have been achieved: the trained and validated ensemble reported an overall Q-statistics of 0.855, while the pairwise Q-statistics for the five classifiers are presented in Table 4. The obtained results prove the efficiency of our ensemble based methodology, in the sense that the ensemble categorized accu- rately 99.71 of the situations and the best individual classifier in terms of results (ANFISC) had 97.99%, while the worst individual classifier (AVC) had 91.26% accuracy. To show how our EBS works, in Fig. 10 we presented the testing dataset package number 9, where the values related to sensor A have been intentionally modified to simulate anomalies at 15 in- stances in time ( t = 10, 20, 40, 60, 70, 71, 72, 90, 100, 105, 120, 130, 140, 145, 150), according to Table 5. The estimated temperatures for the sensor A done by three of our five classifiers (hA,AVC(t), hA,NNC(t), hA,ANFISC(t)) in comparison to the original sensor A time series (without anomalies), are presented in Fig. 11, while the overall decisions taken by the ensemble are depicted in Fig. 12, with the remark that for the first six instances in time the decisions were taken only by the average 0 50 100 150 16 16.5 17 17.5 18 18.5 19 19.5 20 20.5 21 Time Te m pe ra tu re AVC prediction NNC prediction ANFISC prediction sensor A without anomalies Fig. 11. Estimations of the sensor A temperature done by the AVC, NNC and ANFISC classifiers. 0 50 100 150 −0.2 0 0.2 0.4 0.6 0.8 1 Time E B S D ec is io n Fig. 12. The overall decision provided by the EBS. D.-I. Curiac, C. Volosencu / Expert Systems with Applications 39 (2012) 9087–9096 9095 Author's personal copy based classifier due to the reasons mentioned in Section 3 (the necessity to populate input vectors for ALC, NNC, NNAC, ANFISC classifiers with real measurements and the initial convergence of ALC). The estimated values proposed by our ensemble to replace the abnormal measurements are presented in the last line of Table 5. By comparing the sensor A unperturbed measurements with the estimated values, it results a mean absolute error of only 0.118 �C and a maximum absolute error of 0.19 �C obtained at t = 72 and t = 90, confirming once again the preciseness of our methodology. 7. Conclusions It is in the human nature to ask for two, three or maybe more different authorized opinions before taking a significant decision. This human characteristic was transferred to the domain of artifi- cial intelligence through the concept of ensemble based system, producing relevant results in a large variety of domains. Being exposed to numerous risks, WSN often use complex decisional sys- tems for controlling their lifecycle, for processing measurement data or even for dealing with malicious security attacks. Our paper presents a novel perspective upon the sensing anomaly detection in wireless sensor networks by involving an ensemble of five carefully selected binary classifiers. Each of these five classifiers comprises a dynamical model of ‘‘correct behavior’’ of the sensor, estimated based on past measurements provided by the sensor itself or by neighboring sensors. The algorithms involved vary from a simple average computing to complex neural or ANFIS networks assuring the desired diversity of the classifiers to implement an efficient decision making system. Furthermore, using the power of the same ensemble we can precisely estimate the correct value of any measurement affected by sensing anomalies. Acknowledgment This work was developed in the frame of PNII-IDEI-PCE-ID923- 2009 CNCSIS – UEFISCSU. References An, S. H., Heo, G., & Chang, S. H. (2011). Detection of process anomalies using an improved statistical learning framework. Expert Systems with Applications, 38(3), 1356–1363. Belongie, S., Malik, J., & Puzicha, J. (2002). Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(4), 509–522. Cao, F., Lisani, J.-L., Morel, J.-M., Musé, P., & Sur, F. (2008). A theory of shape identification. Lecture Notes in Mathematics, 1948, 264. Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, 41(3), 1–58. Chatzigiannakis, V., & Papavassiliou, S. (2007). Diagnosing anomalies and identifying faulty nodes in sensor networks. IEEE Sensors Journal, 7(5), 637–645. Curiac, D., Volosencu, C., Pescaru, D., Jurca, L., & Doboli, A. (2009). Redundancy and its applications in wireless sensor networks: A survey. WSEAS Transaction on Computer, 8(4), 705–714. Dietterich, T. G. (2000). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40(2), 139–157. García-Pedrajas, N., & Ortiz-Boyer, D. (2009). Boosting k-nearest neighbor classifier by means of input space projection. Expert Systems with Applications, 36(7), 10570–10582. Ho, T. K., Hull, J. J., & Srihari, S. N. (1994). Decision combination in multiple classifier systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(1), 66–75. Hsu, K.-W., & Srivastava, J. (2009). Diversity in combinations of heterogeneous classifiers. In Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD’09), Bangkok, Thailand, LNCS 5476 (pp. 923–932). Huttenlocher, D., Klanderman, G., & Rucklidge, W. (1993). Comparing images using the hausdorff distance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(9), 850–863. Jang, J.-S. R. (1991). Fuzzy modeling using generalized neural networks and Kalman filter algorithm. In Proceedings of the ninth national conf on artificial intelligence (AAAI-91) (pp. 762–767). Jang, J.-S. R. (1993). ANFIS: Adaptive-network-based fuzzy inference systems. IEEE Transactions on Systems, Man, and Cybernetics, 23(3), 665–685. Kim, M., & Kang, D. (2010). Ensemble with neural networks for bankruptcy prediction. Expert Systems with Applications, 37(4), 3373–3379. Kuncheva, L., & Whitaker, C. (2003). Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning, 51(2), 181–207. Li, H., & Sun, J. (2012). Case-based reasoning ensemble and business application: A computational approach from multiple case representations driven by randomness. Expert Systems with Applications, 39(3), 3298–3310. Li, Y. Y., Thomason, M., Parker, L. E. (2010). Detecting time-related changes in wireless sensor networks using symbol compression and probabilistic suffix trees. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS2010), Taipei (pp. 2946–2951). Littlestone, N., & Warmuth, M. (1994). The weighted majority algorithm. Information and Computation, 108(2), 212–261. Ljung, L. (1999). System identification – Theory for the user (2nd ed.). Upper Saddle River, NJ: Prentice Hall. Ljung, L. (2007). System identification toolbox for use with matlab (7th ed.). Natick, MA: The MathWorks Inc. Ljung, L., & Soderstrom, T. (1987). Theory and practice of recursive identification. Cambridge, MA: MIT Press. MathWorks (2002). Matlab fuzzy logic toolbox. User’s guide version 2. Natick, MA: The Math Works Inc. Polikar, R. (2006). Ensemble based systems in decision making. IEEE Circuits and Systems Magazine, 6(3), 21–45. Rajasegarar, S., Leckie, C., & Palansiwami, M. (2008). Anomaly detection in wireless sensor networks. IEEE Wireless Communications, 15(4), 34–40. Shirai, S., Kudo, M., & Nakamura, A. (2009). Comparison of bagging and boosting algorithms on sample and feature weighting. In Proceedings of the 8th international workshop multiple classifier systems (MCS2009), Reykjavik, Iceland, LNCS 5519 (pp. 22–31). Siripanadorn, S., Hattagam, W., & Teaumroong, N. (2010). Anomaly detection in wireless sensor networks using self-organizing map and wavelets. International Journal of Communications, 4(3), 74–83. Tsoumakas, G., Katakis, I., & Vlahavas, I. (2004). Effective voting of heterogeneous classifiers. In Proceedings of the 15th European conference on machine learning (ECML2004), Pisa, Italy (pp. 465–476). Wang, G., Hao, J. X., Ma, J., & Jiang, H. B. (2011). A comparative assessment of ensemble learning for credit scoring. Expert Systems with Applications, 38(1), 223–230. Walters, J. P., Liang, Z., Shi, W., & Chaudhary, V. (2006). Wireless sensor network security: A survey. In Xiao Yang (Ed.), Security in distributed, grid, and pervasive computing, computing (pp. 367–410). CRC Press. Wang, R. X., Lizier, J. T., Obst, O., Prokopenko, M., & Wang, P. (2008). Spatiotemporal anomaly detection in gas monitoring sensor networks. In Proceedings of the European conference on sensor networks (EWSN2008), Bologna, Italy, LNCS 4913 (pp. 90–105). Yang, B., & Bohme, J. F. (1992). Rotation-base RLS algorithms: Unified derivations, numerical properties and parallel implementations. IEEE Transactions on Signal Processing, 40(5), 1151–1166. 9096 D.-I. Curiac, C. Volosencu / Expert Systems with Applications 39 (2012) 9087–9096