key: cord-0500285-kvrr87gi authors: Garcia, Pedro title: Modeling Systems with Machine Learning based Differential Equations date: 2021-09-09 journal: nan DOI: nan sha: c15eba343cf43859938e54ce1f6b29b08d07ba7f doc_id: 500285 cord_uid: kvrr87gi The prediction of behavior in dynamical systems, is frequently subject to the design of models. When a time series obtained from observing the system is available, the task can be performed by designing the model from these observations without additional assumptions or by assuming a preconceived structure in the model, with the help of additional information about the system. In the second case, it is a question of adequately combining theory with observations and subsequently optimizing the mixture. In this work, we proposes the design of time-continuous models of dynamical systems as solutions of differential equations, from non-uniform sampled or noisy observations, using machine learning techniques. The performance of strategy is shown with both, several simulated data sets and experimental data from Hare-Lynx population and Coronavirus 2019 outbreack. Our results suggest that this approach to the modeling systems, can be an useful technique in the case of synthetic or experimental data. It's a known fact, that research at intersection area between two scientific disciplines, can produces remarkable results for both. Physics and Machine Learning are an example of this. The use of machine learning techniques in the characterization of systems or in the prediction of their behavior has been a constant in recent years [3, 4] . In return, physics has collaborated with the explaining, from statistical mechanics point of view, of the training mechanism of neural networks [1] , among many others contributions. In the particular case of subareas of such as Dynamical Systems and Neural Networks, there are developments that are already notable and although there are many strategies that can be inscribed in this intersection, mixed strategies between differential equations theory and neural networks, which suggest that there exists a duality between differential equations and many deep neural networks that is worth trying to understand. With the idea of establishing some order in this type of results, we will divide them into two classes: (i) the estrategies that use neural networks to solve differential equations [14, 13, 16] and (ii) the methodologies that tells us how modern differential equation solvers can simplify the architecture of many neural network [12, 6] . Particularly, in the work of Chen et. al. they define a hybrid of neural net and differential equation, the Neural Ordinary Differential Equation (Neural ODEs) with striking properties including excellent accuracy and greatly reduced memory requirements. On the seccond one of those classes, we will focus our results and comparisons. This association between machine learning and differential equations is very likely to be beneficial for physics and other scientific disciplines, since in our times from the 17th century, differential equations, or its discreet versions, are the most popular way of representing dynamical systems. Particularly in physics, the label of "most popular" belongs to the initial value problems. From this we start our purpose. Given a initial value problem for a general ordinary differential equation: where θ is a vector of parameters. A problem that appears frequently, in many areas, when systems are represented in this way, consists of: given x(t 0 ), calculate x(t 1 ), i.e. an initial value problem, whose formal solution is: In the case in which f cannot be integrated analytically, an approximation to the solution consists of numerically solving the previous integral. In general, regardless of the numerical method used, numerical solution of (2) can be written as: Among all these works we will focus, to motivate ours, on the results of He [12] and Chen [6] , where it is shown that can be designed a neural network that accounts for the rate of change of the system when it goes from state x(t n ) to state x(t n+1 ). The neural network architecture that results from this idea is known as ResNet and is the prehistory of the work [6] , where the authors introduce a new family of deep neural network models that instead of specifying a discrete sequence of hidden layers, parameterize the derivative of the hidden state using a neural network. Both approaches involve the approximation of the integral in (2) with a particular integration scheme, within some neural network training scheme. With this in mind we propose to design a continuous model from discrete observations, using strategies from machine learning regression (different from Neural Networks) to provides an efficient way to use noisy, non-uniformly sampled data to determine a reliable and continuous time model. Non-uniformly sampled data can appear when sampling at the Nyquist frequency is not efficient [7] , when there are incompleteness due to difficulties in data collection [9] and continuous models over time are useful in system identification [2] , forecast [9] and control [] of such systems. Continuous-time system identification from non-uniform sampled data has also been considered using B-splines functions [11] , Kalman filters [8] , Box-Jenkins [5] and expectation-maximization algorithm [8] . In our case, we have used a kernel-besed regression technique to approximate (2) and in order to show how we develop the before mentioned idea, this paper was organized as follow: in section 2 we setting the problem of nonlinear modeling, in section 3 we sketch the solution, in section 4 we show examples of the performance of the proposed method. Finally in section 5 we summarize our observations and give conclusions. We set our problem as the design of a Machine Learning regression version of f in (2) x(t 0 ) = x 0 from a time series {x(t n )} N n=1 with x ∈ n , non-uniformly sampled, from the observation of the unknown system (2) , to construct a model, such that, finite segments of the trajectories of (2) are kept as close as possible to segments of the same length, of trajectories of (5). Once we have chosen the space of functions, of whichf is an element, and defined a norm in that space, it is possible [10] to set a optimization problem: from the functional of error: where, in contrast with the usual assumption in regression techniques that suppose the signal or time series is uniformly sampled in time. The error representation (6) assumes a discrete version of (5) (using the Euler scheme) that we had written as: As an alternative way, to the neural networks proposed in [12, 6] , to approximatẽ f , in this work a kernel-based regression scheme known as Kernel Ridge Regression [17] is used. This scheme, in addition to being conceptually simple, offers a low computational cost solution strategy for (5) . The proposed kernel method [17] belongs to a class of machine learning algorithms implemented for many different inferential tasks and application areas. The inference models, trained using the kernel approach, allow to obtain new data representations for the training patterns, by embedding such observations, into a new feature space, where, using techniques from optimization, mathematical analysis and statistics, one can extrapolate interesting properties, required to the inference for new data. Without losing generality, it can be assumed thatf belongs to a Hilbert space H, so that it can be written as:f where ϕ n is a base for H and α n is the set of coefficients to fit. Given that the cost functional (6) and the approximation (7), the quality of our approximation will look like: for orthonormal basis. If we want to determine the set {α n } such that this functional is minimum, one way is to satisfy ∂L ∂α j = 0, so : From the orthogonality of the set {ϕ n } we have: So that: Thus, our approximation can be written as: or alternatively:f Let's call: so thatf could be represented as:f where ω i = δx i and K is a reproductive kernel of the space H. If we have infinite data, then the sum ( ref ) can be carried out. If not, it is possible to use a regression scheme that allows estimating the w i , using a particular kernel. It is worth noting here, that the non-linear approximation problem off is transformed into a linear, i. e., the determination of ω i , using only elements (x i ) of the original space. Although there are many more alternatives, here we chose K as the Gaussian kernel, Once K is chosen, the nonlinear regression problem off becomes a linear regression problem, i. e., the determination of ω i . Althoug this last task can be done using several online algorithms, here we will use Ridge regression. This scheme, is theoretically simple to interpret [10] , and very easy to implement. The algorithm 1, shows the implementation of the strategy shown in Figure 1 . This is a very general strategy, with the condition that scheme can be stated as on-line scheme, i. e., where the data elements are presented to the method one each time. In order to illustrate how many general are the strategy, we will use several dynamical systems: linear, nonlinear periodic and chaotic, in two and three dimensions. In all the previous cases, the training data comes from numerical simulations of the systems. Additionally we present results for two of those cases using real data. In all examples, M samples of the evolution of the system are considered using a sampling rate δt and a random sample of N data is taken from this data. The resulting time serie is finaly standardized. The capacity of generalization of the fitted model is qualitatively shown, in all cases, as an orbit, generated from the same initial condition, using the fitted model. In the case of two dimensional systems, we also generate, using the model fitted, a phase portrait. There, it is shown how the flow of many initial conditions converges approximately to the observed orbit. In the case of simulated data, the results will be displayed in the form of two figures: for uniformly sampled and no-uniformly sampled data. We show the performance of the method using uniformly and nonuniformly sampled data from: Planar linear system, like in [6] , Lotka-Volterra system, Susceptible-Infected-Recovered (SIR) model, and Chua's chaotic system. A linear system with an asymptotic stable fixed point at x = 0. With parameters α = 1, v = 4, γ = −2 and δ = 2. Here we use as training data N = 100 points data obtained by 4-th order Runge-Kutta method. The Lotka-Volterra equations, also known as the predator-prey equations, are a pair of first-order nonlinear differential equations, frequently used to describe the dynamics of biological systems in which two species interact, one as a predator and the other as prey. The populations change through time according to the pair of equations: Algorithm 1: Kernel Ordinary Differential Equation Initialize variables: ω n , and δt. while stopping criterion not met do for n = 1 to N do Observe input x n . Predict output y n+1 : ODESolver(x 0 , t 0 , t, ω) y n ← y n+1 end Observe true output x n+1 . Update solution (ω n ) based on L(e n ), with e n = x n+1 − y n+1 . with, α = 0.1, β = 0.02, δ = 0.01 and γ = 0.3. Here we use as training data N = 500 points data obtained by 4-th order Runge-Kutta method. Figure 3 . compares the observed and predicted orbit, from the same initial condition and shows the phase protrait generated by the model in the case of this system The SIR model is one of the simplest compartmental models, and many models are derivatives of this basic form. The model consists of three state variable: S the number of susceptible individuals, I the number of infectious individuals and R for the number of removed (and immune) or deceased individuals. It is assumed that the number of deaths is negligible with respect to the total population. This compartment may also be called "recovered" or "resistant". This model is reasonably predictive for infectious diseases that are transmitted from human to human, and where recovery confers lasting resistance, such as measles, mumps and rubella. Figure 4 . compares the observed and predicted orbit, from the same initial condition and shows Figure 4 : SIR model. The big lightgry points shows the simulates data and the black pints the orbit generated by the model from the first element of the data. The left figure shows the result using uniformly sampled data and the right figure, the same with nonuniformly sampled data. Chua's system is a ODE systemẋ with, representing a simple electronic circuit that exhibits classic chaotic behavior. Here we have choose a set of parameter values which give chaotic solutions: α = 9.35159085, β = 14.790319805, γ = 0.016073965, m 0 = −1.138411196, m 1 = −0.722451121. This system presents a strong sensitivity to initial conditions, which makes long-term predictions impossible, as well as structural instability, in the sense that small changes in the parameters lead to totally dissimilar evolution. For these reasons, it constitutes a good benchmark for the validation of models. To show the performance of this strategy in the case of real (noisy) data, we have chosen two data sets. One, the Hare-Liynx data, is a set whose characterization is, for as many reasons very important and other that stands out for its current relevance, data associated with the COVID-19 epidemic. This time series is give by the numerical fluctuations in the populations of Canadian lynx (Lynx canadensis) and snowshoe hare (Lepus americanus) caught and then purchased by the Hudson Bay Company in Canada from 1910 to 1935 for the American fur market [15] . Figure 6 compares the observed and predicted orbit, from the same initial condition and shows the phase protrait generated by the model in the case of this system. This is data in csv format, updated daily from https://github.com/CSSEGI SandData/COVID-19, maintained by Johns Hopkins University Center for Systems Science and Engineering (CSSE). The data available has information about Confirmed, Recovered and Death people, in our case we transform, the data as: S = N p − Conf irmed, I = Conf irmed − Recovered − Deaths and R = Recovered + Deaths. Here N p is the population of the country in each case. Figure 7 compares the observed and predicted orbit, from the same initial condition. We have presented a strategy to design continuous-time models from data. This models allows approximate predict the state of the system at any time (whiting a finite interval) and as far as we know, is the first time that a continuous model has been designed using the kernel methods. To summarize our findings, we will highlight two aspects regarding the performance of proposed strategy: the idea is simple and the algorithm is too, the dynamics of the system is approximated from data and data may contain noise or be non-uniform sampled. The strategy allows incorporating any method of integration into any on-line regression scheme of the model parameters, such as Lasso regression o Gaussian process. Finally, we believe that it is possible to incorporate additional information, besides the data, to improve the performance of the model, like in physics-informed neural networks [16] . Figure 7 : SIR models from experimental data. The two colums of figures content the results of the modeling for data from 8 countrys. The first column the results correspond to: Chile, France, Mexico and Spain and in the second one for: Colombia, Japan, Rusia and USA. The big gray points shows the observed data and the black points, the orbit generated by the model from first element of the data. Statistical mechanics of deep learning System identification algorithm for nonuniformly sampled data The power of machine learning Machine learning and the physical sciences Identification of continuous-time transfer function models from non-uniformly sampled data in presence of colored noise Neural ordinary differential equations Compressed sensing for bioelectric signals: A review Reconstruction of continuous-time systems from their nonuniformly sampled discrete-time systems Estimating the epidemic risk using non-uniformly sampled contact data Haar basis and nonlinear modeling of complex systems Frequency domain identification of continuous-time output error models A model of two nonlinear coupled oscillators for the study of heartbeat dynamics Dgm: A deep learning algorithm for solving partial differential equations Artificial neural networks for solving ordinary and partial differential equations Fluctuations in the Numbers of the Varying Hare (Lepus Americanus) Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond