key: cord-0045943-6xk6et7m authors: Huhn, Francisco; Magri, Luca title: Learning Ergodic Averages in Chaotic Systems date: 2020-05-25 journal: Computational Science - ICCS 2020 DOI: 10.1007/978-3-030-50433-5_10 sha: f271813ec4ed482628c1bb8659bf36289269de7b doc_id: 45943 cord_uid: 6xk6et7m We propose a physics-informed machine learning method to predict the time average of a chaotic attractor. The method is based on the hybrid echo state network (hESN). We assume that the system is ergodic, so the time average is equal to the ergodic average. Compared to conventional echo state networks (ESN) (purely data-driven), the hESN uses additional information from an incomplete, or imperfect, physical model. We evaluate the performance of the hESN and compare it to that of an ESN. This approach is demonstrated on a chaotic time-delayed thermoacoustic system, where the inclusion of a physical model significantly improves the accuracy of the prediction, reducing the relative error from 48% to 1%. This improvement is obtained at the low extra cost of solving a small number of ordinary differential equations that contain physical information. This framework shows the potential of using machine learning techniques combined with prior physical knowledge to improve the prediction of time-averaged quantities in chaotic systems. In the past decade, there has been a proliferation of machine learning techniques applied in various fields, from spam filtering [7] to self-driving cars [3] , including the more recent physical applications in fluid dynamics [4, 6] . However, a major hurdle in applying machine learning to complex physical systems, such as those in fluid dynamics, is the high cost of generating data for training [6] . Nevertheless, this can be mitigated by leveraging prior knowledge (e.g. physical laws). Physical knowledge can compensate for the small amount of training data. These approaches, called physics-informed machine learning, have been applied to various problems in fluid dynamics [4, 6] . For example, [5, 14] improve the predictability horizon of echo state networks by leveraging physical knowledge, which is enforced as a hard constraint in [5] , without needing more data or neurons. In this study, we use a hybrid echo state network (hESN) [14] , originally proposed to time-accurately forecast the evolution of chaotic dynamical systems, to predict the long-term time averaged quantities, i.e., the ergodic averages. This is motivated by recent research in optimization of chaotic multiphysics fluid dynamics problems with applications to thermoacoustic instabilities [8] . The hESN is based on reservoir computing [10] , in particular, conventional Echo State Networks (ESNs). ESNs have shown to predict nonlinear and chaotic dynamics more accurately and for a longer time horizon than other deep learning algorithms [10] . However, we stress that the present study is not focused on the accurate prediction of the time evolution of the system, but rather of its ergodic averages, which are obtained by the time averaging of a long time series (we implicitly assume that the system is ergodic, thus, the infinite time average is equal to the ergodic average [2] .). Here, the physical system under study is a prototypical time-delayed thermoacoustic system, whose chaotic dynamics have been analyzed and optimized in [8] . The ESN approach presented in [11] is used here. The ESN is given an input signal u(n) ∈ R Nu , from which it produces a prediction signalŷ(n) ∈ R Ny that should match the target signal y(n) ∈ R Ny , where n is the discrete time index. The ESN is composed of a reservoir, which can be represented as a directed weighted graph with N x nodes, called neurons, whose state at time n is given by the vector x(n) ∈ R Nx . The reservoir is coupled to the input via an inputto-reservoir matrix, W in , such that its state evolves according to where W ∈ R Nx ×R Nx is the weighted adjacency matrix of the reservoir, i.e. W ij is the weight of the edge from node j to node i, and the hyperbolic tangent is the activation function. Finally, the prediction is produced by a linear combination of the states of the neuronsŷ where W out ∈ R Ny × R Nx . In this work, we are interested in dynamical system prediction. Thus, the target at time step n is the input at time step n + 1, i.e. y(n) = u(n + 1) [14] . We wish to learn ergodic averages, given by where J is a cost functional, of a dynamical system governed bẏ where u ∈ R Nu is the state vector and F is a nonlinear operator. The training data is obtained via numerical integration of Eq. (4), resulting in the time series {u(1), . . . , u(N t )}, where the different samples are taken at equally spaced time intervals Δt, and N t is the length of the training data set. In the conventional ESN approach, W in and W are generated once and fixed. Then, W is re-scaled to have the desired spectral radius, ρ, to ensure that the network satisfies the Echo State Property [9] . Only W out is trained to minimize the mean-squarederror To avoid overfitting, we use ridge regularization, so the optimization problem is where γ is the regularization factor. Because the predictionŷ(n) is a linear combination of the reservoir state x(n), the optimal W out can be explicitly obtained with where I is the identity matrix and Y and X are the column-concatenation of the various time instants of the output data, y, and corresponding reservoir states, x, respectively. After the optimal W out is found, the ESN can be used to predict the time evolution of the system. This is done by looping back its output to its input, i.e. u(n) =ŷ(n − 1) = W out x(n − 1), which, on substitution into Eq. (1), results in with W = W +W in W out . Interestingly, Eq. (8) or the ESN has a co-existing symmetric attractor. While this seemed not to have been an issue in short-term prediction, such as in [5] , it does pose a problem in the long-term prediction of statistical quantities. This is because the ESN, in its present form, can not generate non-symmetric attractors. This symmetry needs to be broken to work with a general non-symmetric dynamical system. This can be done by including biases [10] . However, the addition of a bias can make the reservoir prone to saturation (results not shown), i.e. x i → ±1, and thus care needs to be taken in the choice of hyperparameters. In this paper, we break the symmetry by exploiting prior knowledge on the physics of the problem under investigation with a hybrid ESN. 1 . Schematic of the hybrid echo state network. In training mode, the input of the network is the training data (switch is horizontal). In prediction mode, the input of the network is its output from the previous time step (switch is vertical). The ESN's performance can be increased by incorporating physical knowledge during training [5] or during training and prediction [14] . This physical knowledge is usually present in the form of a reduced-order model (ROM) that can generate (imperfect) predictions. The authors of [5] introduced a physics-informed ESN (PI-ESN), which constrains the physics as a hard constraint with a physics loss term. The prediction is consistent with the physics, but the training requires nonlinear optimization. The authors of [14] introduced a hybrid echo state network (hESN), which incorporates incomplete physical knowledge by feeding the prediction of the physical model into the reservoir and into the output. This requires ridge regression. Here, we use an hESN (Fig. 1) because we are not interested in constraining the physics as a hard constraint for an accurate shortterm prediction [5] . In the hESN, similarly to the conventional ESN, the input is fed to the reservoir via the input layer W in , but also to a physical model, which is usually a set of ordinary differential equations that approximately describe the system that is to be predicted. In this work, that model is a reduced-order model (ROM) of the full system. The output of the ROM is then fed to the reservoir via the input layer and into the output of the network via the output layer. We use a prototypical time-delayed thermoacoustic system composed of a longitudinal acoustic cavity and a heat source modelled with a nonlinear time-delayed model [8, 12, 16] , which has been used to optimize ergodic averages in [8] with a dynamical systems approach. The non-dimensional governing equations are where u, p, ζ andq are the non-dimensionalized acoustic velocity, pressure, damping and heat-release rate, respectively. δ is the Dirac delta. These equations are discretized by using N g Galerkin modes which results in a system of 2N g oscillators, which are nonlinearly coupled through the heat released by the heat sourcė where x f = 0.2 is the heat source location and ζ j = 0.1j + 0.06j 1/2 is the modal damping [8] . The heat release rate,q, is given by the modified King's law [8] , , where β and τ are the heat release intensity parameter and the time delay, respectively. With the nomenclature of Sect. 2, y(n) = (η 1 ; . . . ; η Ng ; μ 1 ; . . . ; μ Ng ). Using 10 Galerkin modes (N g = 10), β = 7.0 and τ = 0.2 results in a chaotic motion (Fig. 2) , with the leading Lyapunov exponent being λ 1 ≈ 0.12 [8] . (The leading Lyapunov exponent measures the rate of (exponential) separation of two close initial conditions, i.e. an initial separation ||δu 0 || grows asymptotically like ||δu 0 ||e λ1t .) However, for the same choice of parameter values, the solution with N g = 1 is a limit cycle (i.e. a periodic solution). The echo state network is trained on data generated with N g = 10, while the physical knowledge (ROM in Fig. 1) is generated with N g = 1 only. We wish to predict the time average of the instantaneous acoustic energy, which is a relevant metric in the optimization of thermoacoustic systems [8] . The reservoir is composed of 100 units, a modest size, half of which receive their input from u, while the other half receives it from the output of the ROM, y ROM . The entries of W in are randomly generated from the uniform distribution unif(−σ in , σ in ), where σ in = 0.2. The matrix W is highly sparse, with only 3% of non-zero entries from the uniform distribution unif(−1, 1). Finally, W is scaled such that its spectral radius, ρ, is 0.1 and 0.3 for the ESN and the hESN, respectively. The time step is Δt = 0.01. The network is trained for N t = 5000 units, which corresponds to 6 Lyapunov times, i.e. 6λ −1 1 . The data is generated by integrating Eq. (11) in time with N g = 10, resulting in N u = N y = 20. In the hESN, the ROM is obtained by integrating the same equations, but with N g = 1 (one Galerkin mode only) unless otherwise stated. Ridge regression is performed with γ = 10 −7 . The values of the hyperparameters are taken from the literature [5, 14] and a grid search, which, while not the most efficient, is well suited when there are few hyperparameters, such as this work's ESN architecture. On the one hand, Fig. 3a shows the instantaneous error of the first modes of the acoustic velocity and pressure (η 1 ; μ 1 ) for the ESN, hESN and ROM. None of these can accurately predict the instantaneous state of the system. On the other hand, Fig. 3b shows the error of the prediction of the average acoustic energy. Once again, the ROM alone does a poor job at predicting the statistics of the system, with an error of 50%. This should not come at a surprise since, as discussed previously, the ROM does not even produce a chaotic solution. The ESN, trained on data only, performs marginally better, with an error of 48%. In contrast, the hESN predicts the time-averaged acoustic energy satisfactorily, with an error of about 7%. This is remarkable, since both the ESN and the ROM do a poor job at predicting the average acoustic energy. However, when the ESN is combined with prior knowledge from the ROM, the prediction becomes significantly better. Moreover, while the hESN's error still decreases at the end of the prediction period, t = 250, which is 5 times the training data time, the ESN and the ROM stabilize much earlier, at a time similar to that of the training data. This result shows that complementing the ESN with a cheap physical model (only 10% the number of degrees of freedom of the full system) can greatly improve the accuracy of the predictions, with no need for more data or neurons. Figure 3c shows the relative error as a function of the number of Galerkin modes in the ROM, which is a proxy for the quality of the model. For each N g , we take the median of 16 reservoir realizations. As expected, as the quality of the model increases, so does the quality of the prediction. This effect is most noticeable from N g = 1 to 4, with the curve presenting diminishing returns. The downside of increasing N g is obviously the increase in computational cost. At N g = 10, the original system is recovered. However, the error does not tend exactly to 0 because W out can not combine the ROM's output only (i.e. 0 entries for reservoir nodes) due to: i) the regularization factor in ridge regression that penalizes large entries; ii) numerical error. This graph further strengthens the point previously made that cheap physical models can greatly improve the prediction of physical systems with data techniques. We stress that the optimal values of hyperparameters for a certain set of physical parameters, e.g. (β 1 , τ 1 ), might not be optimal for a different set of physical parameters (β 2 , τ 2 ). This should not be surprising, since different physical parameters will result in different attractors. For example, Fig. 4 shows that changing the physical parameters from (β = 7.0, τ = 0.2) to (β = 6.0, τ = 0.3) results in a change of type of attractor from chaotic to limit cycle. For the hESN to predict the limit cycle, the value of σ in must change from 0.2 to 0.03 Thus, if the hESN (or any deep learning technique in general) is to be used to predict the dynamics of various physical configurations (e.g. the generation of a bifurcation diagram), then it should be coupled with a robust method for the automatic selection of optimal hyperparameters [1] , with a promising candidate being Bayesian optimization [15, 17] . We propose the use of echo state networks informed with incomplete prior physical knowledge for the prediction of time averaged cost functionals in chaotic dynamical systems. We apply this to chaotic acoustic oscillations, which is relevant to aeronautical propulsion. The inclusion of physical knowledge comes at a low cost and significantly improves the performance of conventional echo state networks from a 48% error to 1%, without requiring additional data or neurons. This improvement is obtained at the low extra cost of solving a small number of ordinary differential equations that contain physical information. The ability of the proposed ESN can be exploited in the optimization of chaotic systems by accelerating computationally expensive shadowing methods [13] . For future work, (i) the performance of the hybrid echo state network should be compared against those of other physics-informed machine learning techniques; (ii) robust methods for hyperparameters' search should be coupled for a "hands-off" autonomous tool; and (iii) this technique is currently being applied to larger scale problems. In summary, the proposed framework is able to learn the ergodic average of a fluid dynamics system, which opens up new possibilities for the optimization of highly unsteady problems. Practical recommendations for gradient-based training of deep architectures Proof of the ergodic theorem End to end learning for self-driving cars Machine learning for fluid mechanics Physics-informed echo state networks for chaotic systems forecasting Turbulence modeling in the age of data A review of machine learning approaches to spam filtering Stability, sensitivity and optimisation of chaotic acoustic oscillations Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication Reservoir computing approaches to recurrent neural network training A practical guide to applying echo state networks A reduced-order model for the onset of combustion instability: physical mechanisms for intermittency and precursors Sensitivity analysis on chaotic dynamical systems by nonintrusive least squares shadowing (NILSS) Hybrid forecasting of chaotic processes: using machine learning in conjunction with a knowledge-based model Efficient optimization of echo state networks for time series datasets Data assimilation in a nonlinear time-delayed dynamical system with lagrangian optimization Bayesian optimization of hyper-parameters in reservoir computing