key: cord-0161918-2a5isuvz authors: Hirose, Kei title: Interpretable modeling for short- and medium-term electricity load forecasting date: 2020-06-01 journal: nan DOI: nan sha: 91d7bf47387cfb12c957f21ea66abde1c45ec95e doc_id: 161918 cord_uid: 2a5isuvz We consider the problem of short- and medium-term electricity load forecasting by using past loads and daily weather forecast information. Conventionally, many researchers have directly applied regression analysis. However, interpreting the effect of weather on these loads is difficult with the existing methods. In this study, we build a statistical model that resolves this interpretation issue. A varying coefficient model with basis expansion is used to capture the nonlinear structure of the weather effect. This approach results in an interpretable model when the regression coefficients are nonnegative. To estimate the nonnegative regression coefficients, we employ nonnegative least squares. Real data analysis shows the practicality of our proposed statistical modeling. The interpretation would be helpful for making strategies for energy-saving intervention and demand response. Short-and medium-term load forecasting with high accuracy is essential for decision making during trade in electricity markets. In the past several decades, many forecasting methods have been proposed in literature. The forecasting techniques are mainly classified into two categories: machine learning approaches and statistical approaches. The machine learning techniques for example, support vector regression (Elattar et al., 2010; Chen et al., 2017; Jiang et al., 2018) , neural networks (He, 2017; Kong et al., 2018; Guo et al., 2018b; Bedi and Toshniwal, 2019) , and hybrid of multiple forecasting techniques (Cho et al., 2013; Miswan et al., 2016; Liu et al., 2017; de Oliveira and Oliveira, 2018) , have attracted attention in recent years. These techniques capture complex nonlinear structures; therefore, high forecast accuracy is expected. Meanwhile, statistical approaches have also been extensively studied. Traditional statistical approaches include linear regression (Amral et al., 2008; Dudek, 2016; Saber and Alam, 2017) , smoothing spline (Engle et al., 1986; Harvey and Kopman, 1993) , autoregressive integrated moving average (ARIMA, Lee and Ko, 2011) , and Kalman filter (Amjady, 2001) . Recently, several authors have applied functional data analysis, where the daily curves of electricity loads are expressed as functions (Cabrera and Schulz, 2017; Vilar et al., 2018) . Most of the statistical approaches are based on probabilistic forecasts, which lead to construction of the forecast interval. The distribution of the forecast values is helpful for risk management (Cabrera and Schulz, 2017) . Reviews of the probabilistic forecasts were carried out by Hong and Fan (2016) ; van der Meer et al. (2018) . To forecast the loads, we usually use past loads, weather, and other factors as exploratory variables (e.g., Lusis et al., 2017) . In practice, the time intervals are different among exploratory variables. For example, assume we forecast loads for a day that is a few days after today at 30-minute intervals; this is a common scenario for market transactions in electricity exchanges (e.g., the day-ahead market in the European Power Exchange, EPEX). The electricity loads are collected at 30-minute intervals using a smart meter, whereas weather forecast information, such as maximum temperature and average humidity, is usually observed at intervals of one day. Thus, this study uses past loads at 30-minute intervals and daily weather forecast information as exploratory variables. The other variables, such as the interior environment of buildings (e.g., Yildiz et al., 2017) , may improve the accuracy but are not used in this study in order to illustrate a basic idea. Our proposed method, which will be described later, is able to directly incorporate the other variables. From a suppliers' point of view, it is crucial to produce an interpretable model to investigate the impact of weather on the loads. For example, estimating the amount of electricity fluctuations caused by weather would be helpful for making strategies for energy-saving intervention (e.g., Guo et al., 2018a; Wang et al., 2018) and demand response (e.g., Ruiz-Abellón et al., 2020) . To produce an interpretable model, it seems one can directly add weather forecast information to the exploratory variables in the regression model (Hong et al., 2010) and investigate the estimator of regression coefficients. More generally, techniques for interpreting any black-box model, including the deep neural networks, have been recently proposed, for example, Local Interpretable Model-agnostic Explanations (LIME) (Ribeiro et al., 2016) and SHapley Additive exPlanations (SHAP) (Lundberg and Lee, 2017) . However, these methods are used for variable selection, i.e., a set of variables that plays an essential role in the forecast is selected. The variable selection cannot estimate the amount of electricity fluctuations caused by weather. In contrast to variable selection, decomposition of the electricity load at time t, say y t , into two parts is useful for interpretation: where µ t and b t are the effects of past loads and weather forecast information, respectively. Typically, we use loads at the same time interval of the previous days as exploratory variables (e.g., Lusis et al., 2017) , and regression analysis is separately conducted on each time interval. We then construct estimators of µ t and b t , sayμ t andb t , respectively. The interpretation is carried out by plotting a curve ofb t . However, without elaborate construction and estimation of b t , we face two issues. The first issue is that the curve ofb t often becomes non-smooth at any time interval (i.e., every 30 minutes) in our experience. The non-smooth daily curve of b t is unrealistic because it implies the impact of daily weather forecast on loads changes non-smoothly every 30 minutes. This problem is caused by the fact that the regression analysis is separately conducted at each time interval. To address this issue, we should estimate parameters under the assumption that b t is smooth. The second issue is related to the parameter estimation procedure. In many cases, the regression coefficients are estimated through the least squares method. In our experience, however, the estimate of regression coefficients related to b t can be negative. The negative regression coefficients lead to a negative value ofb t . Sinceb t is the amount of electricity fluctuations caused by weather, the interpretation becomes unclear. To alleviate this problem, we need to restrict the regression coefficients associated with b t to nonnegative values. In this study, we develop a statistical model that elaborately captures the nonlinear structure of daily weather information to address two challenges, as mentioned earlier. We employ the varying coefficient model (Hastie and Tibshirani, 1993; Fan and Zhang, 1999) with basis expansion, where the regression coefficients associated with weather are assumed to be different depending on the time intervals. The regression coefficients are expressed by a nonlinear smooth function with basis expansion, which allows us to generate a smooth function ofb t . Furthermore, the weather effectb t is also expressed as a nonlinear smooth function. To generate nonnegative regression coefficients, we employ the nonnegative least squares (NNLS, e.g., Lawson and Hanson, 1995) estimation. NNLS estimates parameters under the constraint that all regression coefficients are nonnegative. With the NNLS estimation, the value ofb t is always nonnegative; thus, the interpretation becomes clear. The proposed statistical modeling is applied to actual electricity load data on a research facility. The results show that our proposed model is able to produce an interpretable weather effect. Moreover, the forecast accuracy of our proposed model is comparable to (slightly better than) some existing methods. The remainder of this paper is organized as follows: Section 2 describes our proposed model based on the varying coefficient model. In Section 3, we present the parameter estimation via nonnegative least squares. Section 4 presents the analysis of actual electricity load data. Concluding remarks are given in Section 5. Some technical proofs are deferred to the Appendices. Short-and medium-term forecasting is often used for trading electricity in the market. Among various electricity markets, the day-ahead (or spot) and the intraday markets are popular in electricity exchanges, including the European Power Exchange (EPEX) (https://www.epexspot.com/en/market-data/dayaheadauction) and Japan Electric Power Exchange (JEPX) (http://www.jepx.org/english/index.html). In the dayahead market, contracts for the delivery of electricity on the following day are made. In the intraday market, the power will be delivered several tens of minutes (e.g., 1 hour in JEPX) after the order is closed. In both markets, transactions are typically made in 30-minute intervals; thus, the suppliers must forecast the loads in 30-minute intervals. In this study, we consider the problem of forecasting loads that can be applied to both day-ahead and intraday markets. Let y ij be the electricity load at jth time interval on ith date (i = 1, ..., n, j = 1, ..., J). Typically, J = 48, because we usually forecast the loads in 30-minute intervals. We consider the following model: where µ ij is the effect of past electricity load, b ij is the effect of weather, such as temperature and humidity, and ε ij are error terms with E[ε ij ] = 0. Typically, the error variances in the daytime are larger than those at midnight because of the uncertainty of human behavior in the daytime. Therefore, it would be reasonable to assume that V [ε ij ] = σ 2 j , i.e., the error variances depend on the time interval. One may assume the correlation of errors for different time intervals, i.e., Cor(ε ij , ε ij ) = 0 for some j = j ; however, the number of parameters becomes large. For this reason, we consider only the case where the errors are uncorrelated. Note that the final implementation of our proposed procedure described later is independent of the assumption of the correlation structure in errors. One can express b ij and µ ij as linear or nonlinear functions of predictors and conduct the linear regression analysis. With this procedure, however, we face two issues, as mentioned in the introduction; thus, we carefully construct appropriate functions of b ij and µ ij . Weather forecast information is typically observed at intervals of one day and not 30 minutes (e.g., the maximum temperature of average humidity). For this reason, we assume that the weather forecast information does not depend on j. Let a vector of weather information be s i . We express b ij as a function of s i . Here, we assume two structures as follows: • It is well known that the relationship between weather variables and consumption is nonlinear. For example, the relationship between maximum temperature and consumption is approximated by a quadratic function (e.g., Hong et al., 2010) because air conditioners are used on both hot and cold days. For this reason, it is assumed that b ij is expressed as some nonlinear function of s i . • Although s i does not depend on j, the effect of weather, b ij , may depend on j. For example, consumption in the daytime is affected by the maximum temperature more than that at midnight. In this case, the regression coefficients associated with s i change according to the time interval j. However, if we assume different parameters at each time interval, the number of parameters can be large, resulting in poor forecast accuracy. To decrease the number of parameters, we use the varying coefficient model, in which the coefficients are expressed as a smooth function of the time interval. Under the above considerations, we propose expressing b ij as follows: where g m (s i ) (m = 1, ..., M ) are basis functions given beforehand, β m (j) are functions of regression coefficients, and M is the number of basis functions. We also use the basis expansion for β m (j): where h q (j) (q = 1, ..., Q) are basis functions given beforehand and γ qm are the elements of the coefficient matrix Γ = (γ qm ). Substituting (4) into (3) results in the following: Because h q (j) and g m (s i ) are known functions, the parameters concerning b ij are γ qm . Since the effect of weather is assumed to be smooth according to both j and s i , we use basis functions h q (j) and g m (s i ), which produce a smooth function, such as B-spline and the radial basis function (RBF). Since µ ij is the effect of past consumption, one can assume that µ ij is expressed as a linear combination of past consumption y (i−t−Lα)j , i.e., where T and U are positive integers, which denote how far we trace back through the data and α jt (t = 1, ..., T ) and β ju (u = 1, ..., U ) are positive values given beforehand. Here, L α and L β are nonnegative integers that describe the lags; these values change according to the closing time of transactions * . The regression coefficients α jt correspond to the effects of past consumptions for the same time interval on previous days, while β jt are the coefficients for different time intervals on the same day. For the day-ahead market, we assume that β ju ≡ 0. In practice, however, it is assumed that past consumption also depends on past weather, such as daily temperature. For example, suppose that it was exceptionally hot yesterday and it is cooler today. In this case, it is not desirable to directly use past consumption as the predictor; it is better to remove the effect of past temperature from past consumption. In other words, we can use instead of y (i−t−Lα)j , and y i(j−u−L β ) , respectively. As a result, µ ij is expressed as follows: Substituting (7) into (5) results in the following: The appropriate values of α jt and β ju are chosen by several approaches. A simple method is α jt = 1/T and β ju = 1/U , which implies µ ij is the sample mean of the * For example, transactions of the day-ahead market in the JEPX close at 5:00 pm every day. For the forecast of the 5:30-6:00 pm interval tomorrow, we cannot use the information of today's consumption at the 5:30-6:00 pm interval due to the trading hours of the market, which implies L α = 1. past consumption. Another method is based on the AR(1) structure, i.e., α jt = ρ t α and β ju = ρ u β , where ρ α and ρ β satisfy T t=1 ρ t α = 1 and U u=1 ρ u β = 1, respectively. Note that , so T t=1 ρ t = 1 is equivalent to ρ T +1 − 2ρ + 1 = 0, whose numerical solution is easily obtained. By combining the expressions of b ij in (5) and µ ij in (8), the model (2) is expressed as follows: The model (9) is equivalent to the linear regression model where γ = vec(Γ) and ε = vec(E) with E = (ε ij ). Here, X andỹ are considered as the design matrix and the response vector, respectively. The definitions ofỹ and X are given in Appendix A. Remark 2.1. Although this study considers a statistical model that can be applied to the electricity markets (i.e., 30-minute intervals), our proposed model is directly applicable to any time resolution of data; for example, the load in 1-minute intervals and temperature in 1-hour intervals. To estimate the regression coefficient vector γ, one can use the least squares estimation In our experience, however, the elements of least squares estimatesγ often become negative. In such cases, the estimate of b ij is negative because the basis functions h q (j) and g m (s i ) are generally positive values. When b ij < 0, the values of µ ij become extremely large to adjust for the negative b ij value, which makes interpreting the estimated model difficult. Indeed, y ij ≈ µ ij + b ij means the current consumption is decomposed by the effect of past consumption and that of weather. The interpretation may be realized only when b ij is a nonnegative value; in that case, the value of b ij implies how the weather affects the loads. A nonnegative value of the weather effect b ij is realized when all elements of γ are nonnegative. The nonnegative least squares (NNLS) estimation is useful for estimating nonnegative regression coefficients: The optimization problem (11) is a special case of quadratic programming with nonnegativity constraints (e.g., Franc et al., 2005) . As a result, the NNLS problem becomes a convex optimization problem. Several efficient algorithms to obtain the solution in (11) have been proposed in the literature (e.g., Lawson and Hanson, 1995; Bro and DeJong, 1997; Timotheou, 2016) . We add the ridge penalty (Hoerl and Kennard, 1970) to the loss function of the NNLS estimation: where λ > 0 is a regularization parameter. In our experience, the ridge penalization yields a stable estimator and improves the forecast accuracy. For the day-ahead forecast, we forecast the loads on the next day,ŷ (i+1)j , for the given NNLS estimateγ and weather information s i+1 . The forecast valueŷ (i+1)j is expressed . On the intraday forecast, we may use information about the loads on that day so that Construction of a forecast interval based on (11) or (12) is not easy due to the constraints of the parameter. To derive the forecast interval, we employ a two-stage procedure; first, we estimate the parameter via NNLS to extract variables that correspond to nonzero coefficients. Then, we employ the least squares estimation based on the variables selected in the first step. With this procedure, we should derive the forecast interval after We compare the performance of our proposed methods based on various settings: • Two types of α jt : α jt = 1/T (hereafter referred to as "α1") and AR(1) structure (hereafter referred to as "α2"). Details of the AR(1) structure are presented at the end of Section 2.2. • Two estimation procedures: ridge estimation min γ ỹ − Xγ 2 2 + λ γ 2 2 and NNLS with the ridge penalty in (12). We label these estimation procedures "E1" and "E2", respectively. • Two types of mean structures: (7) and µ ij = T t=1 α jt y (i−t)j in (6) . Hereafter, the mean structures µ ij = T t=1 α jt (y (i−t)j −µ (i−t)j ) and µ ij = T t=1 α jt y (i−t)j are referred to as "µ1" and "µ2", respectively. The number of settings based on the combination of the above settings is 8. We label these settings as follows: S α1,E1,µ1 , S α2,E1,µ1 , S α1,E2,µ1 , S α2,E2,µ1 , S α1,E1,µ2 , S α2,E1,µ2 , S α1,E2,µ2 , and S α2,E2,µ2 . For the basis function g m (s i ), we use the radial basis function (RBF) with a hyperparameter ν g > 0 ( Ando et al., 2008) : where µ m and h 2 m are mean vector and variance, respectively, for the mth cluster. To determine µ m and h 2 m , we employ a two-stage procedure given by Moody and Darken (1989) . First, we apply k-means clustering to s 1 , . . . , s n and obtain clusters {C 1 , . . . , C M }. Then, µ m and h 2 m are estimated as follows: where #C m is the number of observations in cluster C m . The basis function h q (j) is constructed the same way as shown above, but the coefficient function β m (j) is not continuous around the boundary (e.g., j = 1, 48) with the ordinary RBF. To obtain a continuous function, we slightly modify the RBF function as follows: For each setting, we prepare 180 patterns of tuning parameters: Q = 5, 10, M = 5, 10, T = 4, ν g = 1, 4, 9, ν h = 1, 4, 9, and λ = 0, 10 −8 , 10 −6 , 10 −4 , 10 −2 . These tuning parameters are chosen so that the root mean squared error (RMSE) is minimized. Table 1 shows a set of tuning parameters selected on the basis of the RMSE. For all settings, the ridge parameter λ is positive, which suggests that the ridge penalty may be helpful for improvement of the forecast accuracy. The optimal values of the tuning parameters depend on the statistical model. For example, by comparing S α2,E1,µ1 and S α2,E2,µ1 , we find that the NNLS estimation results in a more complex model than the LSE (M = 5 and M = 10 for S α2,E1,µ1 and S α2,E2,µ1 , respectively); due to the constraints on the parameters of the NNLS estimation, the NNLS estimation may require more parameters than the LSE to obtain good forecast accuracy. We evaluate the performance of our proposed method through the monthly mean absolute percentage error (MAPE) The results are shown in Table 2 . For all methods, the forecast error is relatively high. This result occurs because the research facility is small and the electricity usage is unstable. As seen later, standard machine learning techniques also result in high forecast error. We observe that setting µ1 generally performs better than setting µ2 during sea- Indeed, the past loads in February are generally larger than those in March, as shown in Figure 2 ; in Japan, it is much colder in February than in March. For example, to forecast the loads on March 5th, we use the past loads of the past four days of the same weekday: January 29th and February 5th, 19th, and 26th (February 12th was a national holiday in Japan and regarded as Sunday). The loads on the past four days are much larger than that on March 5th. The maximum temperatures on these five days (past four Table 3 to investigate the relationship between load and temperature. Table 3 shows that the temperature on March 5th is much higher than that on the past four days. Therefore, without removing the effect of temperature from past loads, the forecast on March 5th may not work accurately. The estimated model can be interpreted by decomposing the forecast value by the effects of temperature and past loads:ŷ ij =b ij +μ ij . The valuesb ij andμ ij are depicted µ ij = T t=1 α jt y (i−t)j may not appropriately capture the effect of temperature. The poor performance on March 5th by model µ2 is attributed to the above fact. On the other hand, for model µ1,b ij may appropriately capture the effect of temperature. Although the forecast accuracy of the LSE is similar to that of the NNLS estimation, the results of the decompositionŷ ij =b ij +μ ij are completely different. The LSE often results in negative values ofb ij , and thenμ ij becomes much larger than the actual load. Thus, interpreting the effect of weather is difficult with the LSE. This issue occurs because there are no restrictions on the sign ofb ij . On the other hand, the effect of weather is appropriately captured by the NNLS estimation. For example, on March 5th, the effect of the weather is small due to the moderate temperature on that day; on the other hand, the LSE results in large negative values forb ij . The constraint on nonnegativeness of γ greatly improves the interpretation of the estimated model. To further investigate the effect of temperature, we depictb ij when maximum temperatures are 5 • C (cold day), 20 • C (cool day), and 35 • C (hot day), which is shown in Figure 4 . Because we estimate the parameter for each day of the week separately, the weather effects b ij differ by the day of the week. We depictb ij on Tuesday and Sunday. The parameter γ is estimated by using the loads from January 4th, 2016, to December 31st, 2018. The estimation procedure uses setting S α2,E2,µ1 and "separate regression", Proposed method effect may be large for cold and hot days due to air conditioner use. Moreover, the weather effect on Tuesday is generally more significant than that on Sunday because Tuesday is a working day while Sunday is a weekend day at this research facility. When we conduct the regression analysis for each time interval separately, the daily curve ofb ij becomes non-smooth. As a result, our proposed method is more appropriate for interpreting the weather effect than the separate regression. Note that even if we increase the value of the ridge parameter, the weather effect of the separate regression remains non-smooth in our experience. Therefore, a smooth function for regression coefficients is essential for producing a smooth curve forb ij . We compare the performance of our proposed method with the following popular machine learning techniques: random forest, support vector regression (SVR), and lasso. The predictors are electricity loads of the past T = 4 days and maximum temperature. We use R packages randomForest, ksvm, and glmnet to implement these machine learning techniques. In SVR, we use the Gaussian Kernel. The SVR includes several tuning parameters; σ is used in the Gaussian Kernel, C corresponds to the regularization parameter in the SVR problem, and is used in the regression. The data in -tube around the prediction value is not penalized. We prepare the candidates σ = 0.001, 0.01, 0.1, Table 4 shows the MAPE given by (14) for our proposed methods and machine learning techniques. Among all methods, S α2,E1,µ1 yields the best performance in terms of MAPE. S α2,E2,µ1 performs slightly worse than but comparable with S α2,E1,µ1 . If an interpretation of the estimated model is needed, S α2,E2,µ1 is better than S α2,E1,µ1 . Both S α2,E1,µ1 and S α2,E2,µ1 perform slightly better than the machine learning techniques. The forecast accuracy of the machine learning techniques might be improved when we include more information about weather and past loads (e.g., Khoshrou and Pauwels, 2019) . However, even if the forecast accuracy is improved, the standard machine learning techniques cannot provide a smooth curve forb ij . We have constructed a statistical model for forecasting future electricity loads. The key idea of our proposed model is decomposition of the load expressed as y ij ≈ b ij + µ ij , Figure 4 would be helpful for making strategies for energy-saving intervention and demand response. The proposed method is carried out under the assumption that the errors are uncorrelated. In practice, however, the errors among near time intervals may be correlated. As a future research topic, it would be interesting to assume a correlation among time intervals and estimate a regression model that includes the correlation parameter. To show that our proposed model (9) is a regression model, we denote the following: The model (9) is then expressed as follows: Note that we use the following formula for matrices A, B, and C: vec(ABC T ) = (C ⊗ A)vec(B), Furthermore, we denote the following: q i = (y i1α +y i1β , . . . ,y iJα +y iJβ ) T , Appendix B Post-selection inference for the NNLS estimation Appendix B.1 Selection event of NNLS In this section,ỹ and γ are referred to as y and β, respectively, which leads to a standard notation of the linear regression model: The post-selection inference for the NNLS estimation has already been proposed by Lee and Taylor (2014) , but these authors lack a parameter constraint. We have added a constraint on the parameter of the selection event. LetŜ be indices that correspond to variables selected on the basis of the NNLS estimation, i.e.,Ŝ = {j |β j = 0}. Let −Ŝ be indices of variables not selected inŜ. The KKT condition in the NNLS estimation (Franc et al., 2005; Chen et al., 2011) is where µ is the Lagrange multiplier. Substitutingβ = (β T S ,β T −Ŝ ) T into (15) -(18) results in βŜ > 0, β −Ŝ = 0, β T S X T S (y − XŜβŜ) = 0. By combining (19), (21), and (22), we obtain Then, we obtainβŜ = (X T S XŜ) −1 X T S y. Eq. (24) implies the NNLS estimate for the active set coincides with the LSE using the active set. Substituting (24) into (20) and (21) results in The selection event constructed by (25) and (26) is expressed aŝ where A(Ŝ) is given by This selection event for the NNLS estimation has already been studied by Lee and Taylor (2014) but Lee and Taylor (2014) did not include the first inequality (25). where V − (z) = max j:(Ac) j <0 b j − (Az) j (Ac) j , However, we only observe the distribution of η T y for a given z. We consider the marginalization with respect to z. To explain this result, we consider the distribution of a truncated normal distribution Then, we have η T µ,σ 2 η 2 (η T y) | {Ay ≤ b} ∼ U (0, 1), which implies To construct the confidence interval for a given new input x, we let η = X †T S x, and we find L and U , which satisfies the following equation: Letting y * = (y T , ε) T with ε ∼ N (0, σ 2 ) and η = (x T , 1) T , we construct the prediction interval. The algorithm of the post-selection inference for the NNLS estimation procedure is shown in Algorithm 1. 1: Conduct NNLS, and obtain a set of indices for the nonzero coefficientsŜ. Let η = X †T S e j . Calculate c = Ση(η T Ση) −1 , z = (I n − cη T )y. Calculate V − (z) = max j:(Ac) j <0 −(Az) j (Ac) j , V + (z) = min j:(Ac) j >0 −(Az) j (Ac) j . Find L and U , which satisfies F Short-term hourly load forecasting using time-series modeling with peak load estimation capability Short term load forecasting using Multiple Linear Regression Nonlinear regression modeling via regularized radial basis function networks Deep learning framework to forecast electricity demand A fast non-negativity-constrained least squares algorithm Forecasting Generalized Quantiles of Electricity Demand: A Functional Data Approach Nonnegativity constraints in numerical analysis Short-term electrical load forecasting using the Support Vector Regression (SVR) model to calculate the demand response baseline for office buildings Modeling and Forecasting Daily Electricity Load Curves: A Hybrid Approach Forecasting mid-long term electric energy consumption through bagging ARIMA and exponential smoothing methods. Energy Pattern-based local linear regression models for short-term load forecasting. Electric Power Systems Research Electric Load Forecasting Based on Locally Weighted Support Vector Regression Semiparametric estimates of the relation between weather and electricity sales Statistical estimation in varying coefficient models Sequential coordinate-wise algorithm for the nonnegative least squares problem Residential electricity consumption behavior: Influencing factors, related theories and intervention strategies A deep learning model for short-term power load and probability density forecasting Forecasting Hourly Electricity Demand Using Time-Varying Splines Varying-Coefficient Models Load Forecasting via Deep Neural Networks Ridge Regression: Biased Estimation for Nonorthogonal Problems Probabilistic electric load forecasting: A tutorial review Modeling and forecasting hourly electric load by multiple linear regression with interactions A Short-Term and High-Resolution Distribution System Load Forecasting Approach Using Support Vector Regression With Hybrid Parameters Optimization Short-term scenario-based probabilistic load forecasting A data-driven approach Short-Term Residential Load Forecasting Based on LSTM Recurrent Neural Network Solving Least Squares Problems Short-term load forecasting using lifting scheme and ARIMA models Exact Post Model Selection Inference for Marginal Screening Exact post-selection inference, with application to the lasso Probabilistic Load Forecasting via Quantile Regression Averaging on Sister Forecasts A unified approach to interpreting model predictions Short-term residential load forecasting: Impact of calendar effects and forecast granularity ARIMA with regression model in modelling electricity load demand Fast learning in networks of locally-tuned processing units Explaining the predictions of any classifier Integration of demand response and short-term forecasting for the management of prosumers demand and generation Short term load forecasting using multiple linear regression for big data FAST NON-NEGATIVE LEAST-SQUARES LEARNING in the RAN-DOM NEURAL NETWORK Review on probabilistic forecasting of photovoltaic power production and electricity consumption Prediction intervals for electricity demand and price using functional data Impact analysis of customized feedback interventions on residential electricity load consumption behavior for demand response A review and analysis of regression and machine learning models on commercial building electricity load forecasting The author would like to thank Professor Hiroki Masuda and Dr. Maiya Hori for helpful comments and discussions. This work was partially supported by the Japan Society for the Promotion of Science KAKENHI 19K11862 and the Center of Innovation Program (COI) from JST, Japan. Appendix B.2 Distribution of the forecast value after model selection Suppose that y ∼ N (µ, σ 2 I), and consider the problem of deriving the following distribution:Here, η is an n-dimensional vector given beforehand. For example, if η = X †T S e j , then η T y =β j . If