key: cord-0634194-gmg9kpcc
authors: Lian, Yingzhao; Shi, Jicheng; Koch, Manuel; Jones, Colin Neil
title: Adaptive Robust Data-driven Building Control via Bi-level Reformulation: an Experimental Result
date: 2021-06-10
journal: nan
DOI: nan
sha: 3c0f417e05a428695e7d3238c8bfa93e20910cd2
doc_id: 634194
cord_uid: gmg9kpcc

Data-driven control approaches for the minimization of energy consumption of buildings have the potential to significantly reduce deployment costs and increase uptake of advanced control in this sector. A number of recent approaches based on the application of Willems' fundamental lemma for data-driven controller design from input/output measurements are very promising for deterministic LTI systems. This paper addresses the key noise-free assumption, and extends these data-driven control schemes to adaptive building control with measured process noise and unknown measurement noise via a robust bilevel formulation, whose upper level ensures robustness and whose lower level guarantees prediction quality. Corresponding numerical improvements and an active excitation mechanism are proposed to enable a computationally efficient reliable operation. The efficacy of the proposed scheme is validated by a numerical example and a real-world experiment on a single-zone conference building on the EPFL campus. The real-world experiment includes a 20-day non-stop test, where, without extra modeling effort, our proposed controller improves 18% energy efficiency against an industry-standard controller, while also robustly ensures occupant comfort.

Predictive control techniques have recently been proven to be effective for controlling building HVAC systems with reduced energy consumption while simultaneously improving comfort. However, the cost of model building and commissioning has proven to be very high and as a result, data-driven approaches for building energy saving have been receiving broad attention.

The motivation of this work is to develop a practical robust adaptive data-driven building controller, whose timevarying linear dynamics are disturbed by measurable process noise including outdoor temperature and solar radiation, and whose output measurements are contaminated by unknown measurement noise. Beyond building control, systems with measurable disturbances are ubiquitous, especially in energyrelated applications: solar radiation in photovoltaic power systems, electricity demand in power grids, and power generation in airborne wind energy systems, to name a few. The main contributions of this paper are summarized as follows:

• Propose a tractable adaptive robust bi-level data-driven building controller, which decouples data-driven trajectory prediction from control input decisions. • Propose an active excitation mechanism to ensure data quality while maintaining robust building operation. • Experimental validation of the proposed scheme on a real-world building.

Data can be used to model building dynamics [12] or to directly generate/improve control policies. Due to seasonal variations [4] and component wear, building dynamics are usually slowly time-varying, and adaptive model predictive control (MPC) has been introduced to combine online parameter estimation and control in [22] . For example, the experiment in [46] adaptively estimated the model of an evaporator in the HVAC system, which was then used to control the valve setpoints. [34] ran an extended Kalman filter to achieve online parameter adaptation before the estimated model was used in an MPC controller. However, these parameter-estimationbased adaptive methods usually require a-priori knowledge about the structure of the building dynamics and/or the HVAC systems.

Beyond running through a modeling/estimation procedure, data can be used to refine a control policy. The main approaches in this direction include reinforcement learning (RL) [44] and iterative learning control (ILC) [14] . In particular, ILC has been used for buildings with fixed heating/occupant schedules [38] , [52] and RL for learning a building control policy that is not necessarily iterative. To run these learning schemes, a high-fidelity building simulator is usually required, and therefore publications on successful experiments with HVAC systems are rare, with two exceptions being [15] , [39] .

Beyond RL and ILC, data can also be used to directly characterize a system's response from a behavioral theoretic viewpoint [49] . Willems' fundamental lemma is such a tool that provides a characterization of linear time invariant (LTI) arXiv:2106.05740v2 [eess.SY] 27 Nov 2021 systems from measured input and output trajectories. Such a characterization offers a convenient interface to data-driven controller design [16] , [18] , [36] , [37] . Motivated by the simplicity and effectiveness of the fundamental lemma, its extension to a more general setting is attracting broad attention, including nonlinear extensions [7] , [11] , [32] , [42] and the relaxation of the persistent excitation condition [55] .

However, even for LTI systems, the absence of output measurement noise in the standard Willems' fundamental lemma limits its practicality and, hence, a wide range of researchers are studying this challenge [8] , [9] , [17] , [47] , [51] . To the best of the authors' knowledge, most previous work in this area make a number of unrealistic assumptions. For example, the signal to noise ratio in the numerical result of [9] is 10 6 , which is effectively noise free. The regularizers in [17] and [20, p. 19 ] reach 10 5 , thus overpowering the penalty on the tracking error (Details in Section II). [8] , [18] , [47] , [48] consider a model with process noise and measurement-noise-free fully measurable states, but this assumption partially negates the necessity of data-driven methods. While [51] considers a system with measurement noise, the Hankel matrices are assumed to be noise-free. It is noteworthy to point out that no real-world experimental result is available except for [10] , [23] . However, [23] uses an MPC regulator to generate the data used for the data-driven regulator, which limits its application, as a model is still required. Instead of solving the general case, this paper focuses on the applications with unknown measurement noise and measurable process noise, and building control is one of these applications (see Section. II-B for more details).

In the following, Wassertein distance, Willems' fundamental lemma and its corresponding predictive control problem are reviewed in Section II, accompanied by a more in-depth motivation of this paper. A novel robust version of this datadriven controller based on bi-level optimization is introduced in Section III, whose adaptive extension is introduced in Section IV. The mathematical implications of the corresponding lower level problem is summarized in Lemma 2 and the numerical details about the adaptive update and computational improvement are discussed in Section IV-B. Finally, an active excitation mechanism is introduced in Section IV-A, where Algorithm 1 summarizes the complete adaptive robust datadriven controller proposed in this paper. The effectiveness of the proposed algorithm is first validated through simulation with a simple example in Section V, followed by the experimental results with an entire building on the EFPL campus in Section VI. A conclusion of this paper is given in Section VII.

Notation: N (µ, Σ) denotes a Gaussian distribution with mean µ and covariance Σ. ⊗ is the symbol for the Kronecker product. × is the symbol for the set product operation, ⊕ is the symbol of Minkowski sum and is the symbol of Pontryakin difference. colspan(A) denotes the column space (e.g. range) of the matrix A. I n is the n × n identity matrix and O is a matrix of all zeros. · denotes the Euclidean two-norm.

denotes a concatenated sequence of x i ranging from x 1 to x L , and we drop the index to improve clarity if the intention is clear from the context.

Definition 1: A Hankel matrix of depth L associated with a vector-valued signal sequence s :

, is defined as

whose order is denoted by D(B(A, B, C, D)) := n x and n u , n y denotes its input and output dimensions respectively. An L-step trajectory generated by this system is The set of all possible L-step trajectories generated by

Given a sequence of input-output measurements

, we call the input sequence persistently exciting of order L if H L (u d ) is full row rank. By building the following n c -column stacked Hankel matrix

we state Willems' Fundamental Lemma as Lemma 1: [50, Theorem 1] Consider a controllable linear system and assume {u d } T i=1 is persistently exciting of order L+O(B(A, B, C, D)). The condition colspan(H L (u d , y d )) = B L (A, B, C, D) holds.

For the sake of consistency, a datapoint coming from the training dataset is marked by boldface subscript d . L is reserved for the length of the system responses and n c denotes the number of columns in a Hankel matrix.

A data-driven control scheme has been proposed in [16] , [36] , where Lemma 1 generates a trajectory prediction within the following predictive control problem,

where J(·, ·) is a strongly convex objective function, U and Y denote compact convex constraint sets on the input and the output. u init , y init are a t init -step sequence of measured inputs and outputs preceding the current point in time. The matrix H L (y d ) is split into two sub-Hankel matrices as

The matrix H L,init (y d ) is of depth t init and the depth of H L,pred (y d ) is the prediction horizon n h such that t init +n h = L. Hence, y pred is the predictive output sequence viewed from the current point in time. The matrices H L,init (u d ), H L,pred (u d ) are defined accordingly. The choice of t init is made to ensure a unique estimation of the initial state; please refer to [37] for more details. The last two terms (a) in the objective (3a) are reported as an engineering heuristic to alleviate the impact of measurement noise and model mismatch [16] , where η g and η σ are userdefined parameters. However, based on our experiments and the parameters reported in [16] , [20] , [23] , [30] , if the output measurement data y d is noisy or the model is not perfectly LTI, η g and η σ often must be tuned to unreasonably large values in order to maintain an acceptable control performance. Notice that [54] proposed a maximum likelihood based method to adapt η g and η σ via a non-convex optimization problem, but the proposed heuristic algorithm lacks a convergence guarantee. From an intuitive viewpoint, a key issue with the predictive scheme (3) is that it couples the trajectory prediction problem and the optimal control problem into a mono-objective optimization problem. Hence, a dominant prediction error penalty is a must to ensure a reasonable trajectory prediction, which makes the predictive control loss J(·, ·) numerically less relevant. To better see this issue, one can check the feasibility of the predictive scheme (3), where we can assume 0 is a feasible point in both U and Y. The predictive control problem is then always feasible by taking g = 0, regardless of y init and u init . This issue is the ultimate cause motivating this work, where we decouple the prediction and the control decision into a bi-level structure.

The 2-Wasserstein distance between two distributions P x and P y is defined by

where Γ(P x , P y ) is the family of joint distributions whose marginals are P x and P y . The 2-Wasserstein distance models the optimal transport between P x and P y in terms of the Euclidean distance. If P x ∼ N (µ x , Σ x ) and P y ∼ N (µ y , Σ y ), then the squared 2-Wasserstein distance has a closed-form [21] W (P x , P y ) 2 = µ x − µ y 2 + tr Σ x + Σ y − 2(Σ x Σ y )

In this paper, we focus on the following uncertain timevarying linear system

where w i ∈ R nw is a bounded measurable process noise and v i ∼ N (0, Σ v ) with v i ∈ R ny an i.i.d unknown measurement noise. In particular, y is the system output, which is unknown, and y is the measurement read out from the sensors. In particular, in the building control problem, w mostly reflects the external temperature, solar radiation, occupancy, etc. In a similar manner to [30] , [31] , by viewing w as uncontrolled inputs, Lemma 1 can be generalized and the corresponding L-step trajectory is augmented to The trajectory prediction in (3b) is modified accordingly as

whose components follow a similar definition to that in (3b).

In this part, we will establish a rigorous mathematical framework for our proposed method by considering the LTI version of the targeted time-varying dynamics (5):

A Wasserstein distance upper bound between a trajectory given by the fundamental lemma and noisy measurement sequences is summarized in Lemma 2 for LTI systems. The minimization of this upper bound is used to generate a trajectory prediction, which enables a robust data-driven controller for the LTI system in Section III-B. A tractable version of the proposed scheme is discussed in Section III-B, whose single level reformulation is summarized in Lemma 3. The further generalization to time-varying dynamics (5) will be introduced in the next Section IV.

Regarding (7), the system output y init is contaminated by measurement noise, giving a noisy measurement vector y init . The posterior estimate of output y init thus follows the distribution:

where Σ init = I tinit ⊗ Σ v . Similarly, the measurement Hankel matrix H L,init (y d ) is subject to measurement noise, and we denote the uncertain posterior estimate of the system output Hankel matrix by H L,init (y d ). Then, for arbitrary g ∈ R nc , the following lemma quantifies the distribution distance between an uncertain trajectory generated by the fundamental lemma, H L,init (y d )g, and the uncertain system output sequence y init Lemma 2: ∀ g ∈ R nc , the squared Wasserstein distance W (H L,init (y d ) g, y init ) 2 is upper bounded by

Proof: Similar to the posterior distribution of y init , the posterior distribution of the i-th column of the output Hankel matrix H L,init (y) follows a Gaussian distribution N (y d,i:i+tinit−1 , Σ init ) and the adjacent columns are correlated. By the basic properties of Gaussian distributions, we have

In the rest of the proof, we will upper bound the term (a) above,

where terms (b) and (c) follow the same fact that

The inequality (e) uses the fact that t init ≥ 1. We conclude the proof with

Based on the upper bound in Lemma 2, we can generate a trajectory prediction via the following optimization problem:

The left-hand-side of (9b) gives a trajectory generated by the fundamental lemma, while the right-hand-side of (9b), [y init , u init , w init , u pred , w pred ] , is the trajectory composed by measurements read out from the sensors, the planned future input and a predicted disturbance trajectory. Thus, optimization problem (9) minimizes the discrepancy upper bound between the posterior estimate of an I/O sequence and an I/O trajectory estimated by the fundamental lemma. Remark 1: Note that the Hankel matrix H L,pred (y d ) is also subject to measurement noise (v in (5)), which means that the posterior estimate of a fundamental lemma based predictive trajectory H L,pred (y d )g is also uncertain and Gaussian with expectation H L,pred (y d )g. The prediction given by (9a) can therefore be considered as a certainty equivalent prediction. Meanwhile, even though it is possible to consider the uncertainty of y pred , for the sake of a clean layout, we only use the certainty equivalence prediction (9) in this paper.

Remark 2: Different from our analysis, [17] also applies the Wasserstein distance to show that the regularization (term (a) in (3)) minimizes the convex cost J(·, ·) with a distributionally robust guarantee. Their result relies on [24] with a less realistic assumption that the rows of the Hankel matrix are i.i.d. However, the rows of inputs and the rows of outputs may fail to be independently distributed as they should be linearly correlated by the system dynamics.

We hope to maintain the system performance while ensuring robust constraint satisfaction regardless of future realizations of the process noise, and so we assume that the future process noise w pred can be predicted with uncertainty quantification. Buildings are systems satisfying this assumption. For example, the weather forecast can provide a future temperature prediction with an uncertainty tube centered around a nominal prediction, such that the actual future temperature realization fluctuates within this tube. Therefore, we denote the set of predicted future process noise realizations by

where w is the nominal prediction and W denotes the uncertainty tube. And we state our assumption:

Assumption 1: w and W are known.

To make the controller less conservative, we consider a predictive control input with a linear feedback from process noise

where the feedback control K is a decision variable in our predictive control problem. In particular, u reflects the nominal control inputs while feedback K adapts the control input based on the actual realization of the process noise. Based on the prediction problem (9), we can state a bi-level robust predictive control problem for LTI systems:

The upper level problem sends u pred and the feedback control law K to the lower level problem, the lower level problem returns the corresponding set of predictors g(w pred ) with respect to a given w pred , which accordingly gives the output trajectory predictions y pred for that w pred . In turn, the robust constraint in the upper level ensures that input and output constraints are satisfied for all w pred in the considered set w ⊕ W. In conclusion, the upper level problem optimizes u pred and K based on the prediction given by the lower level problem. The bi-level problem (11) is hard to solve numerically, because the objective in the lower level problem is nonconvex. 1 To address this issue, we state a looser, but convex, Wasserstein upper bound in the following corollary.

Proof: In the inequality (d) of Equation (8), we have tr(G) ≥ 0 asG is positive semi-definite. Therefore, we give a convex Wasserstein distance upper bound as

Bringing everything together, we have a numerically tractable robust data-driven predictive control problem for LTI 1 To ensure a composition of two convex functions is convex, the outer convex function needs be non-decreasing [13, Chapter 3.2.4] . To see the nonconvexity in (11), one can plot the following function

systems:

where E g = t 2 init tr(Σ v )I nc and the lower level problem drops the constant term t 2 init tr(Σ v ) in its objective for the sake of clarity. The main benefit of the proposed predictive control problem (12) is that it resolves the issue discussed in Section II by decoupling the trajectory prediction from control decisionmaking into a bi-level optimization problem, which guarantees that the control is chosen with a reliable trajectory prediction.

To derive a tractable single level problem, we denote

The bi-level problem (12) can be effectively solved via the following reformulation. Lemma 3: The following single-level robust optimization problem is equivalent to the bi-level problem (15) .

Proof: Note that the uncertain lower level problem in (15) is strongly convex and therefore can be equivalently represented by its KKT system [19, Chapter 4] . By replacing σ l by H L,init (y d )g l − y init , the Lagrangian of the lower-level problem is

where κ is the dual variable of the equality constraint. Based on this, we have the stationary condition of the KKT system

By recalling the primal feasibility condition

we get the uncertain KKT equation in (13a), which concludes the proof.

On top of the single-level problem (13), we further enforce causality on the decision variable K through a lower-block triangular structure [33, Chapter 5.1],

In particular, causality means that the i-th step of the future process noise can only change the events happening later than it, which only includes u pred,i+1:n h . Remark 3: When the feasible sets U, Y are polytopic, the robust optimization problem (13) can be solved by a standard dualization procedure [6] .

The single level problem stated in Lemma 3 uses an optimality condition to generate trajectory prediction. A similar idea is applied in subspace predictive control [26] , which identifies an ARX model online. These two methods are different: the analysis carried out in subspace predictive control falls into the category of the least square analysis of over-determined linear equations [29] , which particularly assumes that H L,init (y d ) is noise-free [28] . The proposed controller does not have this assumption, and the lower level problem in (15) can also optimize an under-determined equation system.

In this section, we will generalize the controller (12) to time-varying dynamics (5) heuristically. We first summarize the optimization problem solved in our novel adaptive robust controller, whose details will be elaborated later accordingly.

The following optimal control problem will optimize the u pred and the feedback K under a receding horizon scheme, min y pred u pred ,K J(y pred , u pred )

g ∈ arg min

where E g is replaced by a diagonal matrix diag(η g,1 , . . . , η g,nc ) and the weight sequence {η g,i } nc i=1 is decreasing. The feasible set of control inputŨ in (15b), the perturbationw, and the uncertainty setW in (15a) are determined by the working mode, whose details will be given in Section IV-A.

In general, we apply two heuristics to adapt the LTI controller in (12) to the time-varying dynamics (5) . The modifications are summarized as follows:

• Modification of the lower level objective: Note that the objective function in the lower level problem in (12) consists of a penalty on σ 2 and a penalty on g 2 . Recall that the i-th entry of g, g i , is the weight of the i-th column in H and H L,init (y d ) used for prediction. Thus, an inhomogeneous penalty on g can be used to model our preference of using recent data for trajectory prediction, and the decreasing diagonal elements in E g reflect this preference. • Adaptation of the measured dataset: The data are updated online to capture the latest dynamics from the system. In particular, Hankel matrices are updated by appending new input/output measurements on the right side of the Hankel matrices and by discarding old data on the left side. Remark 5: The single-level reformulation in Lemma 3 can be applied in this adaptive problem by replacing H with

In this part, we will discuss the selection of theŨ andW in (13) . Recall that persistent excitation is the key assumption required for Willems' fundamental lemma to apply. However, the update of Hankel matrices may cause a loss of persistent excitation as it is not explicitly considered in the predictive control problem (13) . For example, in building control, if the outdoor temperature and/or solar irradiation are near the building's equilibrium point, no extra heating/cooling is needed if the controller is designed to minimize energy consumption. In this case, the control input will be 0 for a long time, leading to a loss of persistent excitation. To accommodate this issue, we introduce a robust active excitation scheme, which perturbs the control input applied at time i by a random excitation signal

where u pred,1|i is the first element of u pred determined by a predictive control problem (13) solved at time i. In this decomposition, term (a) is determined by the predictive control problem (13) with some specific choice ofŨ andW (see Algorithm (1) below), and the term (b) is a bounded excitation input, which is unknown to the decision process of u pred,1|i . More specifically, from the viewpoint of u pred , u e is an uncontrolled, but measurable, process noise and the underlying time-varying process dynamics therefore becomes

where the excitation input is randomly sampled from a userdefined compact set as u e ∈ U e ⊂ U. The process noisew is accordingly augmented to [w pred , u e ] , and the uncertainty setW in (15a) is augmented to (w ⊕ W) × U e . Meanwhile, due to an extra excitation signal in (16), the feasible set of the control inputŨ is tightened to U U e . In practice, this active excitation mechanism sacrifices the flexibility of the control input for data quality. If not necessary, we should setŨ = U andW = W to generate control inputs instead of using the active excitation scheme. We summarize the general algorithm of the proposed controller in Algorithm 1, which automatically selects the uncertainty setW and input feasible setŨ. In this algorithm, the problem (13) is first solved without active excitation. If its solution expects a loss of persistent excitation, the problem (13) is re-solved considering the active excitation. If checking the satisfaction of persistent excitation by the rank condition results in too high a computational cost, this condition can be replaced by other heuristic conditions. In building control, loss of persistent excitation mainly happens when the control input is constant, and thus the persistence excitation condition can be replaced by checking whether the nominal input is a constant sequence.

Remark 6: In general, generating a persistently excited control input while considering control performance is challenging, as the persistent excitation condition depends on the rank of H L (u d ), which turns the optimization problem into a challenging non-smooth non-convex optimization [35] . It is noteworthy that [25] also perturbs the nominal control input to guarantee persistent excitation, however, their result has no guarantee of robust constraint satisfaction.

Remark 7: Due to the causality constraints in (14), the matrix K in (15b) cannot instantaneously counteract the excitation signal u e with a K = [K w , −I], which is noncausal.

In our proposed adaptive robust controller, the Hankel matrices are updated online with the real-time measurement of u i , y i , w i (Section IV and Algorithm 1). Meanwhile, a numerically efficient reformulation of the robust problem (13) requires an explicit evaluation of matrix inversion M −1 in (13a) at each update. More specifically, when the feasible sets U, Y and the uncertainty setW are polytopic or ellipsoidal, the dualization/explicit upper bound of the robust inequality constraint depends on the matrix inversion M −1 . However, the computational cost of M −1 is O((n c + n r ) 3 ) [43] with n r the number of rows in the matrix H, which is roughly cubic in the size of the data set and it is therefore not scalable. Thus, we propose to apply two linear algebraic techniques to resolve this computational bottleneck.

Notice that the dual variable κ in the lower problem does not affect the upper level problem. For the sake of compactness, we denote M 1,1 := H L,init (y d ) H L,init (y d ) + E g . By matrix inversion of a block-structured matrix, we have

We can therefore replace the constraint (13a) by With the aforementioned modification, the computational cost is lowered to O(n 3 c ) with only the inversion of M 1,1 . We further lower the computational cost by the Woodbury matrix identity,

As E g is diagonal with a simple and explicit inversion, the major computation cost happens at the inversion of a size m matrix in term (a), where m := n y * t init . Thus, the computational cost of M −1 1,1 and M t is lowered to O(m 3 ), which is fixed and independent of the number of columns in the Hankel matrices (i.e. roughly the size of the data-set).

Remark 8: Note that the KKT matrix is usually illconditioned [41, Chapter 16] , replacing the full matrix inversion in (13a) with the proposed techniques can improve the numerical stability, because only the matrix inversion of E g and the term (a) in (17) are evaluated, and these two matrices are well-conditioned.

In this part, we will first demonstrate that our proposed scheme shows comparable performance against some modelbased methods in both LTI and time-varying linear systems.

It should be noted that we are not claiming that the datadriven approach outperforms all model-based approaches, as it is definitely possible to tune a better model based method, such as considering the uncertainty of the identified parameters or using a more complex estimator/controller. We only aim to show that the proposed approach has comparable performance against a model-based method, but without requiring a model. We therefore select a standard model-based controller design pipeline that we believe is reasonable, and we compare this standard scheme against our proposed scheme in this example. The considered second order system is:

where the unknown measurement noise v of two different standard deviations are considered to show the reliability of the proposed scheme. We remind the reader that only the measurement y i is available to both schemes and y i is unknown to both schemes. In the 'standard' scheme, the model is identified by a subspace identification algorithm and the state estimation is done by a Kalman filter 2 . Following this, a robust model predictive controller with linear feedback [33] is used with the standard procedure to generate control inputs 3 min u pred ,Kw y pred max w pred J(y pred , u pred )

where the parameters of A id , B id , E id , C id , D id come from system identification and x 0 is estimated with a Kalman filter. Note that the feedback control law K w is optimized by (18) . Other components, such as J(·, ·), n h and U, Y, W, are identical to those used in the proposed robust data-driven scheme. In particular, we use −10 ≤ u pred ≤ 10, −2 ≤ y pred ≤ 0.5 and −1 ≤ w pred ≤ 1. The loss function is

where y ref is the tracking reference and the prediction horizon is set to n h = 10. Meanwhile, we set t init = 4 in the proposed robust scheme. Finally, to ensure a better model identification result, the standard scheme uses a dataset larger than the one used for the proposed scheme. In particular, 200 data points generated by a PRBS excitation signal is used to identify the model used in (18) , while the proposed scheme only used the first 60 points of this dataset. Two experiments with different levels of measurement noise are considered, whose results are shown in Figure 1 4 In the test of Figure 1 (a) , the standard deviation of the measurement noise is 0.01, which corresponds to a maximal signal to noise ratio of 500 in the data 5 . In this case, the averaged 10-step prediction accuracy 6 of the identified state space model is 99.1%. From Figure 1 (a) , we can see that the proposed scheme shows comparable performance against the model based scheme. In the test of Figure 1 (b) , the standard deviation of the process noise reaches 0.2, leading to a minimal signal to noise ratio of 25. In this case, the averaged 10step prediction accuracy of the identified state space model is 71.1%. Note that this accuracy is not low because of the large measurement noise, which will still cause a prediction error of roughly 20% even when a perfect model is used. While both controllers violate the constraints more frequently in this case, our proposed scheme offers a better constraint satisfaction guarantee. The constraint violation statistics are summarized in Table I , and we observe that our proposed controller offers much better constraint satisfaction. Meanwhile, due to our discussion in Remark 1, the proposed scheme is certainty equivalent and still also encountered a few constraint violations in both tests. We believe that the constraint violation in the model-based method is caused by the inaccuracy of the identified model and can be resolved by methods such as constraint tightening, but there is currently no systematic way of achieving this. Finally, it is noteworthy that the measurement noise used in Figure 1(a) is realistic, while the second test in Figure 1(b) is intended to be a stress test of the proposed control scheme, in order to show the robustness of the proposed scheme to unknown measurement noise.

In the second test, we consider the following time-varying system

.

Our proposed scheme is compared against a model based adaptive method, where a recursive least square (RLS) estimator updates the parameter of an ARX model:

The model estimated by RLS is used in the following robust MPC problem:

where the parameters θ y , θ u and θ w are updated by the RLS estimator. The other settings are the same as the previous experiment. In particular, the RLS estimator has a forgetting factor of 0.95 and it is initialized by the data used to build the Hankel matrices of the proposed scheme. Two experiments with different levels of measurement noise standard deviation are conducted. The results are shown in Figure 2 and the constraint violation statistics are given in Table II . We can see that both approaches perform the tracking task properly, and the proposed scheme shows comparable performance against the RLS-MPC approach. One major observation is that the proposed scheme shows better constraint satisfaction against the RLS-MPC method. It is possible to consider the uncertainty generated by the RLS estimator to improve the robustness of the model-based approach, however, it turns out to be a non-convex robust optimization problem and there is no standard approach to solve this problem. 

The building control experiment is conducted on an entire building, which is a freestanding 600m 2 single-zone building on the EPFL campus, called the Polydome. It is regularly used for lectures/exams and accommodates up to 200 people ( Figure 3 ). In the presented experiments, the indoor temperature is the output y, the active electrical power consumption of the HVAC system is the input u, and the weather conditions (outdoor temperature and solar radiation) are the process noise w = [w 1 ; w 2 ]. The sampling period is 15 minutes and the structure of the control system is depicted in Figure 4 , where the arrows indicate the direction of data transmission. The system consists of five main components • Sensors: Four Z-wave FIBARO DOOR/WINDOW SENSOR V2 are put in different locations in the Polydome to measure the indoor temperature (path (a)). Every five minutes, the temperature measurements are sent to the database through a wireless Z-wave network (path (c)).

The average value of the four measurements is used as the indoor temperature. The active power consumption of the HVAC system is measured via an EMU 3-phase power meter [1] (path (b)). • Database: We use INFLUXDB 1.3.7 [2] to log the timeseries data, which records the measurements from sensors (path (c)), the control input computed from the PC (path (d)) and the historical weather (path (f )). • Weather API: We use TOMORROW.IO [3] , which provides both historical and current measurements of solar radiation and outdoor temperature to the database (path (f )). It also provides the forecast of solar radiation and outdoor temperature to the PC to solve the predictive control problem (path (g)). • Controller: The controller is implemented in MATLAB, interfacing YALMIP, which fetches historical data from the database to build/update the Hankel matrices (path (d)), and acquires weather forecasts from the weather API (path (g)). Fig. 4 . Structure of the building control system

The HVAC system is shipped with an internal hierarchical controller, which includes:

• Mode scheduler: The scheduler determines whether the HVAC is in heating or cooling mode, and we are not authorized to access this scheduler. • Temperature controller: The indoor temperature is controlled by a bang-bang controller that compares the setpoint temperature and the return-air (indoor) temperature with a dead-band of 1°C. For example, in the heating mode, if the return-air temperature is 1°C lower than the set-point temperature, the heating will turn on and run at full power of 8.4 kW until the set-point is reached. Vice versa for cooling, except that the electric power is 7 kW .

To map this controller to our proposed controller, we applied the following strategies:

• As the cooling and heating modes of the HVAC system show different responses, two different I/O datasets for different modes are maintained/updated independently. The controller monitors the mode of the HVAC system and deploys the corresponding I/O dataset to build the proposed controller. • Recall that the input used in our proposed scheme is electrical power consumption. To achieve a desired power consumption, we convert this desired consumption to a set-point sequence: For example, if the HVAC is in heating mode, and a non-zero desired power consumption is planned, the controller will turn on the heating by giving a set-point that is 2°C above the return-air temperature, until the desired energy (P el * T s ) is reached (path (b) in Figure 4 ). Then, the heating is turned off by setting the set-point to the return-air temperature.

In our bi-level predictive control scheme, the Hankel matrices for both heating and cooling modes are built from 200 data-points, with t init = 10 and a prediction horizon n h = 10. The controller minimizes electrical power consumption with the following loss function

To better distinguish the heating and cooling modes in our plots, we use a positive input value for the heating mode and a negative input for the cooling mode. We further enforce the following input constraint to model the maximal power consumption of the heating/cooling.

Note that the HVAC unit consumes a constant 2.4 kW of electricity for ventilation, even without heating or cooling. The aforementioned input constraint excludes this basic ventilation power. The parameter E g in (13) is set by MATLAB command DIAG(LINSPACE(0.2,0.02,n c )). The uncertainty set of the weather forecasts is estimated from an analysis of historical data as

In this section, we describe three experiments that were conducted from May 2021 to June 2021. In particular, the first experiment shows the necessity of robust control and the second experiment shows the adaptivity to mode switching. The second one runs a 20-day experiment to show the adaptivity and reliability of the proposed scheme and the final one runs a 4-day experiment to compare the proposed scheme with the default controller. Meanwhile, recall that we use a negative control input to represent cooling and a positive control input for heating, accordingly, and show the control input and system output (indoor temperature) within the same plot to better show the response from input to output.

1) Experiment 1: The first experiment includes two parts: a non-robust version of the proposed scheme (i.e. W = {0}, K = O) and a robust version with the uncertainty tube given in (20) . In this test, we consider a time-varying indoor temperature constraint with respect to office hours, which is relaxed during the night and is tightened to ensure occupant comfort during office hours.

21°C ≤ y i ≤ 26°C, from 8 a.m. to 6 p.m.

The non-robust experiment was conducted on 14 th May 2021 and the result is plotted in Figure 5 . The HVAC system was in heating mode throughout this experiment, and we can observe that the heat pump is "off" until it hits the lower bound at around 4 A.M.. Later, it started pre-heating the room to satisfy the office hours temperature constraint at around 6 A.M. However, we can still observe frequent but small constraint violations from 8 A.M. to 10 A.M., which then lead to overheating after 10 A.M.

In comparison, the robust controller effectively handled these issues in an experiment conducted on 25 th May 2021. The result is shown in Figure 6 , where we used the same time varying indoor temperature constraint. We can observe that the robust controller safely protects the system from violating the lower bound through the whole test, and it also successfully pre-heated the building to fit the time-varying indoor temperature constraint. The performance deterioration that occurred to the non-robust controller after 10 A.M. was avoided as well, where the controller smoothly turned off the heating without unnecessary overheating.

2) Experiment 2: The second experiment is a pilot test to validate the adaptivity to the mode switching and the necessity of active excitation (Algorithm 1). The experiment was conducted from 28 th to 29 th May 2021, with the result shown in Figure 7 , where the indoor temperature of this experiment is bounded within

This change of heating/cooling mode is depicted as a positive/negative control input and a red/blue shaded region in Figure 7 . However, it is noteworthy that the system was in cooling mode on the second evening. If there was no active excitation, the cooling should be off through the night to minimize energy consumption, and the Hankel matrices In conclusion, the controller successfully carried out the task of energy minimization in this experiment. In particular, when there is no need for heating/cooling, such as during the second evening, only active excitation took effect to maintain persistency of excitation. The heating/cooling also takes effect to robustly maintain the indoor temperature within the constraints.

3) Experiment 3: This experiment was planned to validate the long-term reliability and adaptivity of the proposed controller. The experiment ran continuously for 20 days from 10 th June 2021 to 30 th June 2021. The statistics of this experiment is summarized in Table III , and we only plot results from the 20 th 12:00 to the 30 th 12:00 in Figure 8 to Figure 9 due to the space limit 7 .

Note that the weather varies a lot throughout this experiment, with the outdoor temperature even once surpassing 29°C Second experiment in Polydome: two-day running by the proposed robust data-driven control on the 20 th and also once dropping below 15°C on the 26 th . Therefore, the experiment shows the adaptivity of the proposed controller to weather variation. Moreover, the cooling mode dominated the whole experiment, with only a few days of heating at night. The proposed controller gives a long-term guarantee of temperature constraint satisfaction (see Table III ), while updating the Hankel matrices constantly. Regarding the data update, the active excitation scheme (Algorithm 1) also occasionally took effect to ensure persistency of excitation, such as 00:00 -6:00 A.M. on 17 th June. Fig. 11 . Comparison of the proposed robust controller and default controller: outdoor temperature and solar radiation 4) Experiment 4: Finally, we compare the proposed robust DeePC scheme with the default controller (Section. VI-A), which regulates the supply air temperature based on the return air temperature. A fixed setpoint at 22 • C is given to the default controller in order to ensure the occupants' comfort throughout the day. It is worth noting that the default controller is a benchmark controller widely used in the building control community [5] , [27] , [53] .

The experiment with the proposed controller was conducted from 23 rd July 2021 to 26 th July 2021, and the one with the default controller ran from 7 th July 2021 to 10 th July 2021. To ensure a fair comparison, the weather conditions for these two experiments were similar, as shown in Figure 11 . The indoor temperature and electrical power consumption are plotted in Figure 10 , alongside the statistics of these two experiments summarized in Table IV .

From Table IV and Figure 10 , more constraint violation is observed from the default controller than the proposed controller. One major underlying reason is that the default controller runs without the knowledge of a weather forecast and only cooled down the building when the return air temperature reached 23 • C. We believe that is the reason accounting for the constraint violation in the day-1 experiment of the default controller. If the default controller could have predicted a high solar radiation and turned the cooling on constantly, the temperature constraint violation should have been avoided.

Besides the benefit of robust constraint satisfaction, the proposed controller is also more power efficient than the default controller under similar weather conditions, which in particular consumed 18.4% less electricity than the default controller (Table IV ). Note that, to maintain the data quality, the proposed controller ran the active excitation scheme (Algorithm 1) regularly from 00:00 to 9:00. Thus, there is still the possibility to further improve the energy efficiency of the proposed controller. 

In this work, we propose a robust bi-level data-driven adaptive-predictive building controller based on the Willems' fundamental lemma. The proposed scheme performs comparably to a classical model-based approach in a numerical simulation. The practical viability is shown by deploying the proposed scheme to a conference building on the EPFL campus, where, without extra modeling effort, our proposed scheme improve 18% energy efficiency and robustly ensures occupant comfort.

Modeling techniques used in building hvac control systems: A review. Renewable and sustainable energy reviews

Reducing transient and steady state electricity consumption in hvac using learningbased model-predictive control

Robust optimization

A trajectory-based framework for data-driven system analysis and control

Robust datadriven state-feedback design

Robust constraint satisfaction in data-driven mpc

Data-driven model predictive control: closed-loop guarantees and experimental results

Data-based stabilization of unknown bilinear systems with guaranteed basin of attraction

Intelligent systems for building energy and occupant comfort optimization: A state of the art review and recommendations

Convex optimization

A survey of iterative learning control

Experimental analysis of data-driven control for a building heating system

Data-enabled predictive control: In the shallows of the deepc

Regularized and distributionally robust data-enabled predictive control

Formulas for data-driven control: Stabilization, optimality, and robustness

Foundations of bilevel programming

Data-enabled predictive control of autonomous energy systems

The fréchet distance between multivariate normal distributions

All you need to know about model predictive control for buildings

Dataenabled predictive control for quadcopters

Data-driven distributionally robust optimization using the wasserstein metric: Performance guarantees and tractable reformulations

Input perturbations for adaptive control and learning

Spc: Subspace predictive control. IFAC Proceedings Volumes

Optimal residential model predictive control energy management performance with pv microgeneration

Total least squares

Decentralized dataenabled predictive control for power system oscillation damping

From system level synthesis to robust closedloop data-enabled predictive control

Nonlinear data-enabled prediction and control

Minimax approaches to robust model predictive control

Online simultaneous state estimation and parameter adaptation for building predictive control

Persistently exciting model predictive control

On the linear quadratic data-driven control

Data-driven simulation and control

Groundhog day: Iterative learning for building temperature control

On-line building energy optimization using deep reinforcement learning

A matrix trace inequality

Numerical optimization

Data-driven internal model control of second-order discrete volterra systems

Reinforcement learning: An introduction

Open modbus/tcp specification

Adaptive-model predictive control of electronic expansion valves with adjustable setpoint for evaporator superheat minimization

A matrix finsler's lemma with applications to data-driven control

From noisy data to feedback controllers: non-conservative design via a matrix s-lemma

Introduction to mathematical systems theory: a behavioral approach

A note on persistency of excitation

A data-driven convex programming approach to worst-case robust tracking controller design

Iterative learning control in large scale hvac system

Model predictive control with adaptive machine-learning-based model for building energy efficiency and comfort optimization

Maximum likelihood estimation in data-driven modeling and control

On controllability and persistency of excitation in datadriven control: Extensions of willems' fundamental lemma