key: cord-0058981-ro7svk7c authors: Oh, Cheolhwan; Jeong, Jongpil title: Non-intrusive Load Monitoring Based on Regularized ResNet with Multivariate Control Chart date: 2020-08-19 journal: Computational Science and Its Applications - ICCSA 2020 DOI: 10.1007/978-3-030-58802-1_47 sha: 7cc631c4021fe17e9fce1e7b37cd8c7d2d968489 doc_id: 58981 cord_uid: ro7svk7c With the development of industry and the spread of the Smart Home, the need for power monitoring solution technologies for effective energy management systems is increasing. Of these, non-intrusive load monitoring (NILM), is an efficient way to solve the electricity consumption monitoring problem. NILM is a technique to measure the power consumption of individual devices by analyzing the power data collected through smart meters and commercial devices. In this paper, we propose a deep neural network (DNN)-based NILM technique that enables energy disaggregation and power consumption monitoring simultaneously. Energy disaggregation is performed by learning a deep residual network for performing multilabel regression. Real-time monitoring is performed using a multivariate control chart technique using latent variables extracted through weights of the trained model. The energy disaggregation and monitoring performance of the proposed method is verified using the public NILM Electricity Consumption and Occupancy (ECO) data set. Recently, the use of electronics in most homes and buildings has become an indispensable factor for convenience of living, and the introduction of equipment using power in factories has been increasing due to the automation of factories following industrial development. CO2 emissions increased by 43% as the total energy use from 1984 to 2004 increased by 49% due to rapid industrial growth. The analysis of the effect of the method of feedback to consumers on the use of energy was conducted by the Electric Power Research Institute (EPRI) in the United States [1] . According to the results of their research, the energysaving effect occurs when providing energy consumption patterns to end-users. Providing detailed power consumption information for each device in real-time produced an energy-saving effect of approximately 12% on average. Intrusive load monitoring (ILM) monitors energy usage through smart meters connected to each device that requires monitoring. ILM measures the energy consumption directly for each device, so it has the advantage of high accuracy [2] , but because each device requires the installation of a smart meter, it is difficult to manage and expensive and, thus, challenging to generalize [3] . Non-intrusive load monitoring (NILM) uses a single smart meter that captures aggregated electrical energy signals. The goal of this technique is to discover devices that contribute to aggregated signals. NILM is preferred over ILM in both academia and industry [4] because it reduces financial costs and the burden on users or homeowners to participate in the energy monitoring process. NILM is a method of separating the total electrical load of a building measured at a single point into individual device signals using a combination of electrical collection systems and signal processing algorithms [5] . It extracts the features of the data collected by signal processing and performs load identification through the extracted features [6] . In NILM, the feature extraction process significantly influences the load identification to be performed later, so selecting and applying an appropriate feature extraction method from the collected power data is an essential task. Many studies have proposed feature extraction methods to improve the performance of load identification, contributing to the development of NILM technology [7, 8] . Deep learning technology that can automatically extract features by learning data continues to develop rapidly and is being applied positively in various industries. In contrast to traditional methods of extracting direct features, deep learning models automatically extract features from raw data [9] . In the field of NILM, a deep learning approach has also been proposed, and its effectiveness has been proven [10] . In this paper, we propose an NILM framework based on deep learning algorithms. The proposed method performs energy disaggregation through multilabel regression by training a deep learning model. Moreover, by extracting the latent variable of electrical signal data as a feature through the trained model, and using this feature to draw a multivariate control chart, it provides an end-user with a monitoring dashboard system that can report real-time energy usage. Our primary contribution is extending and applying the control chart-a monitoring methodology used predominantly in process control-to the NILM field. Furthermore, we use a deep learning model already trained for energy disaggregation, reducing the required effort. The rest of this paper is organized as follows. Section 2 discusses related work. Section 3 explains the control chart in statistical process control. Section 4 describes the proposed NILM framework. Section 5 presents experiments conducted to evaluate the performance of the proposed approach and describes the experimental results. Section 6 discusses our conclusions. NILM was introduced by Hart in the 1980s [5] . Since its introduction, several NILM algorithms have been proposed to analyze data collected at low and high sampling rates. For directly estimating the power demand of each device from the aggregate power signal, an approach based on a hidden Markov model (HMM) [11] and factorial HMM (FHMM) [12] has been studied. Another powerful option for solving signal processing problems is graph signal processing (GSP) [13, 14] . In previous research, both steady-state and transient state analyses were used to separate the energy consumption per device. In steady-state analysis, active power, reactive power, current waveform, and harmonic components were used as features for load disaggregation [15] . Transient energy and transient shapes were used to disaggregate the loads in transient state analysis [16, 17] . Features extracted from steady-state and transient states were used in combination to increase accuracy [18, 19] . The researchers relied on hand-made feature extractors such as the Fourier transform, wavelet transform, spectral-domain representation, and electrical parameters. However, these extractors are timeconsuming and error-prone, creating difficulty in finding the optimal feature for disaggregation [20] . As a solution, researchers have begun using deep neural networks (DNNs) that can automatically extract features by learning the hidden patterns of raw signals. In [10] , three types of DNN structures-long short-term memory (LSTM) units, a denoising autoencoder, and a regression model-were used to predict the start time, end time, and average power demand of each device. Lukas et al. [21] proposed a combination of HMM and DNN for energy disaggregation, demonstrating results that surpass FHMM. In [22] , the researchers proposed an architecture with a deep recurrent LSTM network for supervised disaggregation of loads that outperformed alternatives for loads representing the periodicity of energy consumption. He et al. [23] used multiple parallel convolution layers with various filter sizes to improve the disaggregation performance of LSTM-based models. Barsim et al. [24] proposed a deep residual network-based disaggregation model and found that deeply-stacked layers could be effective for energy disaggregation. Researchers found that a convolutional neural network (CNN) architecture based on seq2seq or seq2point may be more effective in energy disaggregation [25, 26] . Other researchers proposed a supervised learning method in which a 1D CNN-RNN model combining CNN and RNN is applied to NILM [27] . Statistical process control (SPC) is a methodology that finds solutions or improvement by identifying and interpreting problems based on data, rather than monitoring the system with a guess. It is mainly used for quality control [28] . A univariate control chart can be easily applied and interpreted by directly using operational data instead of a mathematical model. However, univariate SPC methods can cause erroneous results when applied to multivariate data with a high correlation between variables, and it is not efficient to draw a control chart for each variable [29] . A multivariate control chart capable of simultaneously observing changes in two or more management items is required to efficiently manage two or more related data or to simultaneously monitor a process. Several multivariate control charts have been proposed, such as the Hotelling's T 2 chart, multivariate EWMA chart, and multivariate CUSUM chart [29] . Hotelling's T 2 chart is the most widely used [30] . The T 2 statistic is calculated as follows: wherex and S are the sample mean vectors and sample covariance matrices, respectively, determined from past data X ∈ R n×m collected in the control state, in which n and m are the numbers of samples and variables. This is used as a statistic to determine the similarity between the data collected in the control state and the newly measured data. Hotelling's T 2 statistic refers to the Mahalanobis distance between historical data collected in the control state and new measurements. The upper control limit of the T 2 chart is based on the assumption of normality and follows the F distribution [28] . Equation (2) can be used to calculate the upper control limit, which is the critical value. In Eq. (2), the significance level α is the type I error rate. This is the maximum allowable limit for false alarms that misjudge positive as negative. F (m,n−m,α) is the upper α th quantile of the F -distribution with m and (n − m) degrees of freedom. However, the multivariate process control based on Hotelling's T 2 is not useful for data with many correlated variables. If there are many correlation variables, it is difficult to invert the covariance matrix S because the covariance matrix becomes nearly a singular matrix, which leads to problematic results [31] . Furthermore, including many highly correlated variables in the data may cause multicollinearity, deteriorating the ability to detect progress shifts [32] . Consequently, various latent-variable-based control charts for extracting features from raw data have been proposed. Hotelling's T 2 chart based on principal component analysis (PCA) is a representative case among latent variable-based control chart methodologies. PCA is a technique that finds axes orthogonal to each other while preserving the variance of data as much as possible and transforms the data in the high-dimensional space into low-dimensional space without linear correlation [33] . This is similar to the T 2 chart without PCA described in Eqs. (1) and (2), except that a Q chart for performing residual analysis is added and used together. First, the data matrix X is decomposed into individual elements through PCA, and the individual elements are further divided into principal component subspaces (PCS) and residual subspaces (RS) according to the number of principal components selected, as follows: where T = TT and P = PP are the score and loading matrices, respectively.T ∈ R n×p andT ∈ R n×m−p are score matrices belonging to PCS and RS, respectively, andP ∈ R m×p andP ∈ R m×m−p are loading matrices belonging to PCS and RS, respectively. For the number of principal components p, after drawing the scree plot, select the number of principal components corresponding to the elbow point or the number of principal components that can explain the variance as desired by the user. After determining the number of principal components p, the PCA-based T 2 statistic can be calculated, which is calculated as follows:t Wheret is the score vector of x in the PCS andΛ is the diagonal matrix of the largest eigenvalues of the covariance matrix ofX. The upper control limit of PCA-based T 2 is calculated as Eq. (6), which is almost the same as Eq. (2), as follows: However, because the T 2 P CA statistic calculated through Eq. (5) uses only the information in the PCS, variations occurring in the RS may not be detected [34] . Therefore, the Q chart is additionally used to detect shifts that cannot be explained only by the information contained in PCS. The Q chart can be constructed using the residuals obtained from RS. PCA-based Q statistics monitor the squared error between the true vector x and the vectorx estimated by PCA. The PCA-based Q statistic is calculated as follows: Assuming the Q statistic, squared prediction error, follows a normal distribution, we can calculate the upper control limits of the Q chart with the following approximation based on the weighted chi-squared distribution [35] : where m and v are the sample mean and variance of Q P CA , respectively, and α is the type I error rate. This functions accurately even when the prediction error does not follow a normal distribution [36] . However, the PCA-based MSPC technique is based on the linearity of data. Therefore, using PCA as-is without removing nonlinearity does not accurately reflect the information in the data and limits the accuracy of detecting anomalies. Consequently, researchers have proposed various methodologies to reflect nonlinearity, the most being the kernel method. Researchers [37] proposed kernel PCA (KPCA), which removes nonlinearity by mapping data to a high level using a kernel function to achieve data linearity. Furthermore, it extracts latent variables considering nonlinearity by applying PCA to linearized data. KPCA is favored because it is simple to use and can adequately consider the nonlinearity of data. In the field of process monitoring, several studies extracted latent variables using KPCA and applied to multivariate control charts, and their effectiveness has been proven [38] [39] [40] . PCA-and KPCA-based multivariate control charts are used to monitor the system that generates X data by identifying latent variables representative of the relationship of X and obtaining a control limit. However, in several cases, we prefer to monitor not X but the output Y produced by the system, for which partial least squares (PLS) is the most commonly used alternative to PCA. PLS shares similarities with PCA. Whereas PCA extracts latent variables that maximize the variance of the linear combination of X, PLS extracts latent variables that maximize the covariance between the linear combination of X and Y [41] . Just as PCA uses a matrix X consisting of measurements of process variables to monitor process variables, PLS-based control charts can also monitor quality variables using matrix Y consisting of measurements of quality variables. However, PLS, like PCA, cannot consider the nonlinearity of data. Therefore, KPLSbased methodologies for considering nonlinearity have been proposed. Furthermore, PLS-and KPLS-based multivariate control chart techniques have been proposed to monitor the status of the products produced during the process [42] . Recently, a method of applying the latest deep learning method to extract latent variables for use in a multivariate control chart has been proposed. The authors [43] proposed a method of extracting latent variables and applying them to a multivariate control chart using an unsupervised learning algorithm, one of the deep learning methods. Latent variables were extracted using a variational autoencoder model capable of extracting the feature of the independent variable X, confirming that the performance of the proposed method is superior to that of the existing PCA-and KPCA-based multivariate control charts. This section explains the proposed method. We propose an NILM methodology based on DNN algorithms. The primary contribution is an energy disaggregation and real-time monitoring dashboard based on the training of a single neural network model. The proposed method is divided into three keywords. First, sequence-to-point learning corresponds to the form of the input/output data of a neural network model for performing energy disaggregation. Second, the regularized residual network corresponds to the architecture of a DNN performing NILM. Regularization is used to extract latent variables to be used in multivariate control charts. The last keyword, latent-variable-based multivariate control charts, aims to provide real-time monitoring dashboards to end-users. The latent variables needed to implement a multivariate control chart are extracted from the residual network for energy disaggregation and consumption prediction-no additional process is used to extract latent variables for implementing a multivariate control chart. We generalized the energy disaggregation problem of NILM as a prediction problem using a neural network model into a regression model that predicts the energy consumption of individual electronic devices Y ∈ R n×k using data X ∈ R n×m collected from the main power line. In this model, n is the number of data records collected according to the sampling rate of the smart meter and m is the number of features of the electrical data set collected and stored by the smart meter installed on the main power line. These features typically include voltage and current, phase, power factor, and I-V trajectory [6] . k is the number of electronic devices requiring prediction of energy consumption through a regression model. For defining the regression problem for energy disaggregation as a sequenceto-point neural network model, the original electrical data matrix X is sliced into a specific sequence length l, which is expressed as a three-dimensional array W through the following procedure: A method of improving the energy disaggregation performance by predicting the energy consumption Y corresponding to the midpoint of an individual sequence was proposed [25] , but because this method predicts the midpoint of the sequence, a time delay of l 2 occurs. The lower the sampling rate and longer the sequence length l, the more challenging the model becomes for real-time monitoring. Therefore, we designed the input/output structure of the model to predict energy consumption corresponding to the endpoint of each sequence rather than the midpoint. The primary feature of the residual network is the residual shortcut connection between consecutive convolutional layers. The difference between a residual network and a typical convolutional network, such as a fully convolutional network (FCN), is that a linear shortcut residual connection is added to connect the input and output of the residual block [44] . Through this connection, the residual network robustly adapts to the degradation problem, which was one of the chronic problems in DNNs, facilitating the neural network training for feature extraction by building the neural network into a deeper structure. Therefore, we used a deep residual network to sufficiently extract features from the electrical data collected by the smart meter. The structure of the model in this study consists of six residual blocks, each composed of three layers; the end of the model is composed of a global average pooling (GAP) layer and regularized linear activation function to perform multilabel regression. In general neural network architectures, it is common not to apply regularization to the layer that performs classification and regression using latent variables extracted through the previous layer. However, because we need to monitor the energy consumption Y of individual electronic devices through latent variables extracted from the GAP layer, the GAP layer must contain enough information about Y. Accordingly, we apply strong regularization to the linear activation function layer that performs regression such that the latent variable extracted from the GAP layer does not depend on the weight of the linear activation function layer. In this study, regularization was performed through L2-norm and Max-norm [45] . The overall structure of the model is illustrated in Fig. 1 . Researchers demonstrated that a residual network designed with this architecture and parameters is effective in extracting features from time-series data [46] and provided strong guidelines for designing a residual network for feature extraction of time-series data. Algorithm 1 is the pseudocode for a method of monitoring energy consumption using a multivariate control chart. We used the Electricity Consumption and Occupancy (ECO) data set for our experiment, which was collected from six swiss households over eight months [47] at a sampling rate of 1 Hz. For the main meter, the ECO dataset has 16 metered variables, including total power consumption, three-phase power, neutral current, three-phase current, three-phase voltage, and phase angles of V 12, V 13, I1−V 1, I2−V 2, and I3−V 3. The sub-metered data consists of several types of appliances. Because the data collected from the main meter in the ECO dataset is multivariate with 16 variables, it is suitable for use in this study, which proposes a methodology for monitoring energy usage based on a multivariate control chart. Input: training set Xtr ∈ R n×m and Ytr ∈ R n×k Input: test set Xte ∈ R r×m and Yte ∈ R r×k Input: sequence length l Input: Explained variance v Input: Regularization parameter λ and c Input: Type 1 error rate α 1 Wtr ← Reshaping data Xtr by l through equations (11) /* Train predictive models */ 2 Let Residual Blocki, where i = {1, 2, · · · , d} 3 θ ← Initialize the parameters of the predictive models 4 while Update until θ converges do In the ECO data set, the number and type of electronic devices that collect data for each household differ, as are the ratios of the total amount of power consumed by each electronic device to the total energy consumption of each household. Of the six houses, house2 collected data from the largest number of household appliances, and electricity consumption covered by the smart plugs was also highest. Therefore, we conducted an experiment using data collected from house 2. House2 collects data from 12 individual electronic devices. In this paper, we experiment using the data collected in January 2013 from the total data from House 2. The stove corresponding to the data from the tenth electronic device is not continuously collected. Therefore, the experiment is performed only on 11 data points, excluding the stove. The periods used are January 1-21, 2013, for training data, and January [22] [23] [24] [25] [26] [27] [28] 2013 , for test data. The model was trained using data for three weeks, and the trained model was tested using data collected during the following one week. Each residual block is first composed of three convolutions whose output is added to the residual block's input and then fed to the next layer. The number of filters for convolutions is fixed at 64, 128, and 256, with a ReLU activation function preceded by a batch normalization operation. In each residual block, the filter's length is set to 8, 5, and 3, respectively, for the first, second, and third convolution. The stride is fixed at 1. The padding is set so that the input and output of the layer are equivalent. For residual network optimization, we used Adam optimizer with a learning rate of 0.001 as the optimization Algorithm, and the loss was set to the mean square error (MSE). The training was performed for 1000 epochs, and the batch size was set to 4096. The validation data set used as much as 10% of the training data set. λ and c are set to 0.1 and 1, respectively. The number of principal components of PCA performed to implement T 2 and Q charts using latent variables extracted from the last GAP layer of the residual network was selected based on explained variance, and the threshold was set to 80%. Furthermore, the type I error rate of the control limit α was set to 5%. The sequence length l for performing sequence-to-point learning was set to 60 in the experiment. We use the data collected from x t−59 to x t to predict the energy consumption y t . We use four metrics to evaluate the performance of multi-label regression performed by the residual network from various perspectives. The metrics we used are Root Mean Square Error (RMSE), Signal Aggregate Error (SAE), Normalized Disaggregation Error (NDE), and Accuracy. First, we evaluate the performance of the residual network using RMSE metrics, which are commonly used to evaluate the predictive performance of regres- Where,ŷ t i and y i t are prediction and ground truth of the i-th electronic device at the t-th time step, respectively. The second metric, SAE, is known to evaluate the model in terms of the total error in energy over a period of time. In contrast to the SAE, which focuses on energy consumption over a period of time, the third metric, NDE, is a measure of whether the predictive model predicts energy consumption well at every time point. The fourth metric, accuracy, evaluates whether the total energy is properly classified regardless of whether each electronic device is a low-power or high-power device [48] . Accuracy We compare our experimental results with those of previously proposed researchers to evaluate the performance of our proposed model. We conducted a comparative evaluation using the methods suggested by the researchers in [14] and their experimental results. They proposed a method to improve energy disaggregation performance by extracting patterns and features from multivariate data collected from the main meter using a graphical modeling approach, a spatiotemporal pattern network (STPN). Furthermore, they provide the experimental results of FHMM, combinatorial optimization (CO), and probabilistic finite-state automaton (PFSA), which are general methodologies for performing energy disaggregation in NILM, along with the experimental results of their proposed method. They conducted experiments using training and test data sets collected during the same period used in our study. The performance metrics RMSE, AE, NDE, and accuracy for the five energy disaggregation models are listed in Table 1 . Based on the experiment, the prediction of the residual network using sequence-to-point learning outperforms other methods (Fig. 2) . For evaluating the monitoring performance of a multivariate control chart, we confirm whether energy consumption rapidly increased when statistics exceeded the control limit. By identifying the level at which the multivariate control chart caused a false alarm, we evaluate the performance of the control chart drawn with the latent variable extracted by our proposed method. When the statistics exceed the control limit, if the statistics outside the control limit are higher based on the T-test than the energy consumption of the previous l, they are classified as outliers. The monitoring performance is evaluated by calculating the false positive rate (FPR) (i.e., one-specificity, false alarm rate) [49] . FPR decreased from 13.8% for the general PCA-based T 2 and Q chart to 9.5% for ResNet without regulation and 4.9% for ResNet with regularization. In this paper, we proposed a framework to simultaneously perform energy disaggregation and multivariate control chart-based monitoring as a method of extracting features through a sequence-to-point learning-based residual network from multivariate data collected from the main meter. Our primary contribution is achieving both tasks simultaneously by training only one model. Moreover, we proposed a framework for applying multivariate control chart techniques, commonly used in process management, to NILM. Experimental results demonstrate that the neural network model trained by our proposed method can extract latent variables sufficient for the monitoring of energy consumption of electronic devices and energy disaggregation. In this study, the performance was evaluated only with FPR because the energy increase event could not be separately extracted from the energy data. Furthermore, because experiments have not been applied through all regulations, it was not possible to identify which regulatory methods and parameters are optimal. Therefore, future research should apply additional regulatory methodologies to data that can accurately specify abnormal events. Residential electricity use feedback: a research synthesis and economic framework Demand side management: benefits and challenges Nonintrusive appliance load monitoring based on integer programming Non-intrusive load monitoring through home energy management systems: a comprehensive review Nonintrusive appliance load monitoring NILM techniques for intelligent home energy management and ambient assisted living: a review Comprehensive feature selection for appliance classification in NILM Non-intrusive load monitoring algorithm based on features of V-I trajectory Representation learning: a review and new perspectives Proceedings of the 2nd ACM International Conference on Embedded Systems for Energy-Efficient Built Environments Exploiting HMM Sparsity to perform online real-time nonintrusive load monitoring Efficient inference in dual-emission FHMM for energy disaggregation Non-intrusive load disaggregation using graph signal processing Multivariate exploration of non-intrusive load monitoring via spatiotemporal pattern network Non-intrusive electric appliances load monitoring system using harmonic pattern recognition-trial application to commercial building A new transient feature extraction method of power signatures for nonintrusive load monitoring systems Power-spectrum-based wavelet transform for nonintrusive demand monitoring and load identification Load signature study-part I: basic concept, structure, and methodology Load signature study-part II: disaggregation framework, simulation, and applications Deep neural network based energy disaggregation A novel DNN-HMM-based approach for extracting single loads from aggregate power signals A new approach for supervised power disaggregation by using a deep recurrent LSTM network An empirical study on energy disaggregation via deep learning On the feasibility of generic deep disaggregation for singleload extraction Sequence-to-point learning with neural networks for non-intrusive load monitoring Convolutional sequence to sequence non-intrusive load monitoring New design of a supervised energy disaggregation model based on the deep neural network for a smart grid Introduction to Statistical Quality Control A review of multivariate control charts Multivariate Quality Control. Techniques of Statistical Analysis Process Dynamics and Control Disturbance detection and isolation by dynamic principal component analysis Principal component analysis Statistical process monitoring with principal components Some theorems on quadratic forms applied in the study of analysis of variance problems, I. Effect of inequality of variance in the one-way classification Critical evaluation of approaches for on-line batch process monitoring Nonlinear component analysis as a kernel eigenvalue problem Nonlinear process monitoring using kernel principal component analysis Improved kernel PCA-based monitoring approach for nonlinear processes Kernel PCA-based GLRT for nonlinear fault detection of chemical processes Partial least-squares regression: a tutorial A novel framework for fault diagnosis using kernel partial least squares based on an optimal preference matrix Process monitoring using variational autoencoder for high-dimensional nonlinear processes Deep residual learning for image recognition Dropout: a simple way to prevent neural networks from overfitting Time series classification from scratch with deep neural networks: a strong baseline The ECO data set and the performance of non-intrusive load monitoring algorithms REDD: a public data set for energy disaggregation research Self-adaptive statistical process control for anomaly detection in time series Acknowledgments. This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support