key: cord-0266018-p7jv79ae authors: Lee, Kookjin; Parish, Eric J. title: Parameterized Neural Ordinary Differential Equations: Applications to Computational Physics Problems date: 2020-10-28 journal: nan DOI: 10.1098/rspa.2021.0162 sha: 812f9a18c9768d24536bdd784d08f82336a8f28a doc_id: 266018 cord_uid: p7jv79ae This work proposes an extension of neural ordinary differential equations (NODEs) by introducing an additional set of ODE input parameters to NODEs. This extension allows NODEs to learn multiple dynamics specified by the input parameter instances. Our extension is inspired by the concept of parameterized ordinary differential equations, which are widely investigated in computational science and engineering contexts, where characteristics of the governing equations vary over the input parameters. We apply the proposed parameterized NODEs (PNODEs) for learning latent dynamics of complex dynamical processes that arise in computational physics, which is an essential component for enabling rapid numerical simulations for time-critical physics applications. For this, we propose an encoder-decoder-type framework, which models latent dynamics as PNODEs. We demonstrate the effectiveness of PNODEs with important benchmark problems from computational physics. Numerical simulations of dynamical systems described by systems of ordinary differential equations (ODEs) 1 play essential roles in various engineering and applied science applications. Such examples include predicting input/output responses, design, and optimization [55] . These ODEs and their solutions often depend on a set of input parameters, and such ODEs are denoted as parameterized ODEs. Examples of such input parameters within the context of fluid dynamics include Reynolds number and Mach number. In many important scenarios, high-fidelity solutions of parameterized ODEs are required to be computed i) for many different input parameter instances (i.e., many-query scenario) or ii) in real time on a new input parameter instance. A single run of a high-fidelity simulation, however, often requires fine spatiotemporal resolutions. Consequently, performing real-time or multiple runs of a high-fidelity simulation can be computationally prohibitive. To mitigate this computational burden, many model-order reduction approaches have been proposed to replace costly high-fidelity simulations. The common goal of these approaches is to build a reduced-dynamical model with lower complexity than that of the high-fidelity model, and to use the reduced model to compute approximate solutions for any new input parameter instance. In general, model-order reduction approaches consist of two components: i) a low-dimensional latent-dynamics model, where the computational complexity is very low, and ii) a (non)linear mapping that constructs high-dimensional approximate states (i.e., solutions) from the low-dimensional states obtained from the latent-dynamics model. In many studies, such models are constructed via data-driven techniques in the following steps: i) collect solutions of high-fidelity simulations for a set of training parameter instances, ii) build a parameterized surrogate model, and iii) fit the model by training with the data collected from the step i). In the field of deep-learning, similar efforts have been made for learning latent dynamics of various physical processes [33, 9, 47, 61, 20] . Neural ordinary differential equations (NODEs), a method of learning time-continuous dynamics in the form of a system of ordinary differential equations from data, comprise a particularly promising approach for learning latent dynamics of dynamical systems. NODEs have been studied in [71, 9, 26, 62, 42, 12, 22] , and this body of work has demonstrated their ability to successfully learn latent dynamics and to be applied to downstream tasks [9, 61] . Because NODEs learn latent dynamics in the form of ODEs, NODEs have a naturally good fit as a latent-dynamics model in reduced-order modeling of physical processes and have been applied to several computational physics problems including turbulence modeling [53, 46] and future states predictions in fluids problems [2] . As pointed out in [16, 8] , however, NODEs learn a single set of network weights, which fits best for a given training data set. This results in an NODE model with limited expressibility and often leads to unnecessarily complex dynamics [18] . To overcome this shortcoming, we propose to extend NODEs to have a set of input parameters that specify the dynamics of the NODE model, which leads to parameterized NODEs (PNODEs). With this simple extension, PNODEs can represent multiple trajectories such that the dynamics of each trajectory are characterized by the input parameter instance. The main contributions of this paper are • an extension to NODEs that enables them to learn multiple trajectories with a single set of network weights; even for the same initial condition, the dynamics can be different for different input parameter instances, • a framework for learning latent dynamics of parameterized ODEs arising in computational physics problems, • a demonstration of the effectiveness of the proposed framework with advection-dominated benchmark problems, which are a class of problems where classical linear latent-dynamics learning methods (e.g., principal component analysis) often fail to learn accurately [40] . Classical reduced-order modeling. Classical reduced-order modeling (ROM) techniques rely heavily on linear methods such as the proper orthogonal decomposition (POD) [30] , which is analogous to principal component analysis [31] , for constructing the mappings between a high-dimensional space and a low-dimensional space. These ROMs then identify the latent-dynamics model by executing a (linear) projection process on the high-dimensional equations e.g., Galerkin projection [30] or least-square Petrov-Galerkin projection [7, 6] . We refer readers to [3, 4, 48] for a complete survey on classical methods. Physics-aware deep-learning-based reduced-order modeling . Recent work has extended classical ROMs by replacing proper orthogonal decomposition with nonlinear dimension reduction techniques emerging from deep learning [34, 27, 20, 40, 41, 36] . These approaches operate by identifying a nonlinear mapping (via, e.g., convolutional autoencoders) and subsequently identifying the latent dynamics as certain residual minimization problems [20, 40, 41, 36] , which are defined on the latent space and are derived from the governing equations. In [34, 27] , the latent dynamics is identified by simply projecting the governing equation using the encoder, which may leads to kinematically inconsistent dynamics. Another class of physics-aware methods include explicitly modeling time integration schemes [51, 63, 73, 21] , adaptive basis selection [60] , and adding stability/structure-preserving constraints in the latent dynamics [17, 28] . We emphasize that our approach is closely related to [63] , where neural networks are trained to approximate the action of the first-order time-integration scheme applied to latent dynamics and, at each time step, the neural networks take a set of problem-specific parameters as well as reduced state as an input. Thus, our approach can be seen as a time-continuous generalization of the approach in [63] . Purely data-driven deep-learning-based reduced-order modeling. Another approach for developing deeplearning-based ROMs is to learn both nonlinear mappings and latent dynamics in purely data-driven ways. Latent dynamics are modeled as recurrent neural networks with long short-term memory (LSTM) units along with linear POD mappings [70, 56, 46] or nonlinear mappings constructed via (convolutional) autoencoders [23, 72, 45, 66] . In [53, 46] , latent dynamics and nonlinear mappings are modeled as neural ODEs and autoencoders, respectively; in [49, 43, 65, 47] , autoencoders are used to learn approximate invariant subspaces of the Koopman operator. Relatedly, there have been studies on learning direct mappings via e.g., a neural network, from problem-specific parameters to either latent states or approximate solution states [64, 19, 58, 10, 35, 69] , where the latent states are computed by using autoencoder or linear POD. Enhancing NODE. Augmented NODEs [16] extends NODEs by augmenting additional state variables to hidden state variables, which allows NODEs to learn dynamics using the additional dimensions and, consequently, to have increased expressibility. ANODE [22] discretize the integration range into a fixed number of steps (i.e., checkpoints) to mitigate numerical instability in the backward pass of NODEs; ACA [76] further extends this approach by adopting adaptive stepsize solver in the bardward pass. ANODEV2 [75] proposes a coupled system of neural ODEs, where both hidden state variables and network weights are allowed to evolve over time and their dynamics are approximated as neural networks. Neural optimal control [8] formulates an NODE model as a controlled dynamical system and infers optimal control via an encoder network This formulation results in an NODE that adjusts the dynamics for different input data. Moreover, improved training strategies for NODEs have been studied in [18] and an extension of using spectral elements in discretizations of NODE has been proposed in [54] . Neural ODEs (NODEs) are a family of deep neural network models that parameterize the time-continuous dynamics of hidden states using a system of ODEs: where z(t) is a time-continuous representation of a hidden state, f Θ is a parameterized velocity function, which defines the dynamics of hidden states over time, and Θ is a set of neural network weights. Given the initial condition z(0) (i.e., input), a hidden state at any time index z(t) can be obtained by solving the initial value problem (IVP) (3.1). To solve the IVP, a black-box differential equation solver can be employed and the hidden states can be computed with the desired accuracy: In the backward pass, as proposed in [9] , gradients are computed by solving another system of ODEs, which are derived using the adjoint sensitivity method [52] , which allows memory efficient training of the NODE model. As pointed out in the papers [16, 8] , an NODE model learns a single dynamics for the entire data distribution and, thus, results in a model with limited expressivity. To resolve this, we propose a simple, but powerful extension of neural ODE. We refer to this extension a "parameterized neural ODEs" (PNODEs): with a parameterized initial condition z 0 (µ), where µ = [µ 1 , . . . , µ nµ ] ∈ D ⊂ R nµ denotes problem-specific input parameters. Inspired by the concept of "parameterized ODEs", where the ODEs depend on the input parameters, this simple extension allows NODE to have multiple latent trajectories that depend on the input parameters. This extension only requires minimal modifications in the definition of the velocity function f Θ and can be trained/deployed by utilizing the same mathematical machinery developed for NODEs in the forward pass (i.e., via a black-box ODE solver) and the backward pass (i.e., via the adjoint-sensitivity method). In practice, f Θ is approximated as a neural network which takes z and µ as an input and then produces dz dt as an output. We now investigate PNODEs within the context of performing model reduction of computational physics problems. We start by formally introducing the full-order model that we seek to reduce. We then describe our proposed framework, which uses PNODEs (or NODEs) as the reduced-order (latent-dynamics) model. The full-order model (FOM) corresponds to a parameterized system of ordinary differential equations (ODEs):u = f (u, t; µ), u(0; µ) = u 0 (µ), where u(t; µ), u : [0, T ] × D → R N denotes the state, which is implicitly defined as the solution to the system of ODEs. Here, µ ∈ D denotes the ODE parameters that characterize physical properties (e.g., boundary conditions, forcing terms), D ⊂ R nµ denotes the parameter space, where n µ is the number of parameters, and T denotes the final time. The initial state is specified by the parameterized initial condition denotes the velocity andu denotes the differentiation of u with respect to time t. Solving (5.1) requires application of an ODE solver, where the computational complexity rapidly grows with degrees of freedom N (e.g., N ∼ 10 7 for many practical problems in computational physics). Reduced-order modeling mitigates the high cost associated with solving the FOM by operating on reduced computational models that comprise i) a (non)linear mapping that constructs high-dimensional states from reduced states and ii) a low-dimensional latent-dynamics model for the reduced states. We denote the mapping from the reduced states to the high-dimensional states as d d d : R p → R N , and denote the latent dynamics model as the system of parameterized ODEs: u =f (û, t; µ),û(0; µ) =û 0 (µ), (5.2) whereû(t; µ),û : [0, T ] × D → R p denotes the reduced state, which is a low-dimensional representative state of the high-dimensional state (i.e., p N ). Analogously,û 0 (µ),û 0 : D → R p denotes the reduced parameterized initial condition, andf (û, t; µ),f : R p × [0, T ] × D → R p denotes the reduced velocity. The objective of the ROM is to learn both a nonlinear mapping and a latent-dynamics model such that the ROM generates accurate approximate solutions to the full-order model solution, i.e., d d d(û) ≈ u. The aim here is to learn the latent dynamics with the PNODE: find a set of NODE parameters Θ such thatu wheref Θ (·, ·; ·, Θ) : R p × [0, T ] × D → R p denotes the reduced velocity, i.e., modeling a ROM (Eq. (5.2)) as PNODE. To achieve this goal, we propose a framework, where, besides a latent-dynamics model described by the PNODE, two additional functions are required: i) an encoder, which maps a high-dimensional initial state u 0 (µ) to a reduced initial stateû 0 (µ), and ii) a decoder, which maps a set of reduced statesû k , k = 1, . . . , n t Figure 1 : The forward pass of the proposed framework: i) the encoder (red arrow), which provides a reduced initial state to the PNODE, ii) solving PNODE (or NODE) with the initial state results in a set of reduced states, and iii) the decoder (blue arrows), which maps the reduced states to high-dimensional approximate states. to a set of high-dimensional approximate statesũ k , k = 1, . . . , n t . We approximate these functions with two neural networks: With all these neural networks defined, the forward pass of the framework can be described as 1. encode a reduced initial state from the given initial condition:û 0 (µ) = h h h enc (u 0 (µ); θ θ θ enc ), 2. solve a system of ODEs defined by PNODE (or NODE): 3. decode a set of reduced states to a set of high-dimensional approximate states:ũ k = h h h dec (û k ; θ θ θ dec ), k = 1, . . . , n t , and 4. compute a loss function L(ũ 1 , . . . ,ũ nt , u 1 , . . . , u nt ). Figure 1 illustrates the computational graph of the forward pass in the proposed framework. We emphasize that the proposed framework only takes the initial states from the training data and the problem-specific ODE parameters µ as an input. PNODEs still can learn multiple trajectories, which are characterized by the ODE parameters, even if the same initial states are given for different ODE parameters, which is not achievable with NODEs. Furthermore, the proposed framework is significantly simpler than the common neural network settings for NODEs when they are used to learn latent dynamics: the sequence-to-sequence architectures as in [9, 61, 74, 45, 46] , which require that a (part of) sequence is fed into the encoder network to produce a context vector, which is then fed into the NODE decoder network as an initial condition. In the following, we apply the proposed framework for learning latent dynamics of parameterized dynamics from computational physics problems. We then demonstrate the effectiveness of the proposed framework with results of numerical experiments performed on these benchmark problems. To train the proposed framework, we collect snapshots of reference solutions by solving the FOM for pre-specified training parameter instances µ ∈ D train ≡ {µ k train } ntrain k=1 ⊂ D. This collection results in a tensor where n t is the number of time steps. The mode-2 unfolding [38] of the solution tensor U U U gives consists of the FOM solution snapshots for µ k train and the first column corresponds to the initial condition u 0 (µ k train ). Among the collected solution snapshots, only the first columns of U U U (µ k train ), k = 1, . . . , n train (i.e., the initial conditions) are fed into the framework, the rest of solution snapshots are used in computing the loss function. Assuming the FOM arises from a spatially discretized partial differential equation, the total degrees of freedom N can be defined as N = n u × n 1 × · · · × n n d , where n u is the number of different types of solution variables (e.g., chemical species), and n n d denotes the number of spatial dimensions of the partial differential equation. Note that this spatially-distributed data representation is analogous to multi-channel images (i.e., n u corresponds to the number of channels); as such we utilize (transposed) convolutional layers [39, 24] in our encoder and decoder. In the experiments, we employ convolutional encoders and transposed-convolutional decoders. The encoder consists of four convolutional layers, followed by one fully-connected layer, and the decoder consists of one fully-connected layer, followed by four transposed-convolutional layers. To decrease/increase the spatial dimension, we employ strides larger than one, but we do not use pooling layers. For the nonlinear activation functions, we use ELU [13] after each (transposed) convolutional layer and fully-connected layer, with an exception of the output layer (i.e., no activation at the output layer). Moreover, we employ NODE and PNODE for learning latent dynamics and modelf Θ as fully-connected layers. For the nonlinear activation functions, we again use ELU. Lastly, for ODESolve, we use the Dormand-Prince method [15] , which is provided in the software package of [9] . Our implementation reuses many parts of the software package used in [61] , which is written in PyTorch. The details of the configurations will be presented in each of the benchmark problems. For training, we set the loss function as the mean squared error and optimize the network weights (Θ, θ θ θ enc , θ θ θ dec ) using Adamax, a variant of Adam [37] , with an initial learning rate 1e-2. At each epoch of training, the loss function is evaluated on the validation set, and the best performing network weights on the validation set are chosen to try the model on the test data. The details on a training/validating/testing-split will be given later in the descriptions of each experiment. For the performance evaluation metrics, we measure errors of approximated solutions with respect to the reference solutions for testing parameter instances, µ k test . We use the relative 2 -norm of the error: where · F denotes the Frobenius norm. The first benchmark problem is a parameterized one-dimensional inviscid Burgers' equation, which models simplified nonlinear fluid dynamics and demonstrates propagations of shock. The governing system of partial differential equations is 35] . The boundary condition w(0, t; µ) = µ 1 is imposed on the left boundary (x = 0) and the initial condition is set by w(x, 0; µ) = 1. Thus, the problem is characterized by a single variable w (i.e., n u = 1) and the two parameters (µ 1 , µ 2 ) (i.e., n µ = 2) which correspond to the Dirichlet boundary condition at x = 0 and the forcing term, respectively. Following [59] , discretizing Eq. 6.2 with Godunov's scheme with 256 control volumes results in a system of parameterized ODEs (FOM) with N = n u n 1 = 256. We then solve the FOM using the backward-Euler scheme with a uniform time step ∆t = 0.07, which results in n t = 500 for each µ k train ∈ D train . Figure 2 depicts snapshots of reference solutions for parameter instances (µ 1 , µ 2 ) = (4.25, 0.015) at time t = {7.77, 11.7, 19.5, 23.3, 27.2, 35.0}, illustrating the discontinuity (shock) moving from left to right as time proceeds. For the numerical experiments in this subsection, we use the network described in Table 1 . 6.3.1. Reconstruction: approximating a single trajectory with latent-dynamics learning In this experiment, we consider a single training/testing parameter instance µ 1 train = µ 1 test = (µ 1 1 , µ 1 2 ) = (4.25, 0.015) and test both NODE and PNODE for latent-dynamics modeling. We set the reduced dimension 2 as p = 5 and the maximum number of epoch as 20,000. Figure 3 depicts snapshots of the reference solutions and approximated solutions computed by using the framework with NODE ( Figure 3a ) and with PNODE ( Figure 3b We now consider two multi-parameter scenarios. In the first scenario (Scenario 1), we vary the first parameter (boundary condition) and consider 4 training parameter instances, 2 validation parameter instances, and 2 test parameter instances. The parameter instances are collected as shown in Figure 4a We train the framework with NODE and PNODE with the same set of hyperparameters with the maximum number of epochs as 50,000. Again, the reduced dimension is set to p = 5. Figures 5a-5b depict snapshots of reference solutions and approximated solutions using NODE and PNODE. Both NODE and PNODE learn the boundary condition (i.e., 4.67 at x = 0) accurately. For NODE, this is only because the testing boundary condition is linearly in the middle of two validating boundary conditions (and also in the middle of four training boundary conditions) and minimizing the mean squared error results in learning a single trajectory with the NODE, where the trajectory has a boundary condition, which is exactly the middle of two validating boundary conditions 4.389 and 4.944. Moreover, as NODE learns a single trajectory that minimizes MSE, it actually fails to learn the correct dynamics and results in poor approximate solutions as time proceeds. As opposed to NODE, the PNODE accurately approximates solutions up to the final time. Table 2 (second row) shows the relative 2 -errors (Eq. 6.1) for both NODE and PNODE. Continuing from the previous experiment, we test the second testing parameter instance, D test = {(5.22, 0.015)}, which is located outside D train (i.e., next to µ (7) in Figure 4a) . The results are shown in Figures 5c-5d : the NODE only learns a single trajectory with the boundary condition, which lies in the middle of validating parameter instances, whereas the PNODE accurately produces approximate solutions for the new testing parameter instances. Table 2 (third row) reports the relative errors. Next, in the second scenario (Scenario 2), we vary both parameters µ 1 and µ 2 as shown in Figure 4b : the sets of the training, validating, and testing parameter instances correspond to We have tested the set of testing parameter instances and Table 3 reports the relative errors; the result shows that PNODE achieves sub 1% error in most cases. On the other hand, NODE achieves around 10% errors in most cases. The 1.7% error of NODE for µ 1 test is achieved only because the testing parameter instance is located in the middle of the validating parameter instances (and the training parameter instances). Study on effective latent dimension. We further have tested the framework with NODE and PNODE for varying latent dimensions p = {1, 2, 3, 4, 5} with the same hyperparameters described in Table 1 with the maximum number of epochs as 50,000 and the same training/validating/testing split shown in Figure 4b . For all testing parameter instances, the dimension of latent states marginally affects the performance of the NODEs. We believe this is because NODE learns a dynamics that minimizes the MSE over four validating parameter instances in D val regardless of the latent dimensions. On the other hand, decreasing the latent dimension smaller than two (p < 3) negatively affects the performances of the PNODEs for all testing parameter instances. Nevertheless, even with the latent dimension one, p = 1, PNODE still outperforms NODE in all testing parameter instances; with p = 2, PNODE starts to produce almost order-of-magnitude more accurate approximate solutions than NODE does. Moreover, we observe that, for the given training data/training strategy and the hyperparameters, increasing the latent dimension (larger than p = 2) only marginally improves the accuracy of the solutions, which agrees with the observations made in [41] and, in some cases, a neural network work with larger p (i.e., p > 5) requires more epochs to reach the level of the training accuracy that is achieved by a neural network with smaller p (i.e., p = {3, 4, 5}). Study on varying training/validating data sets. We now assess performance of the proposed framework for different settings of training and validating data sets. This experiment illustrates the dependence of the framework on the amount of training data as well as settings of training/validation/testing splits. To this end, we have trained and tested the framework with PNODE on three sets of parameter instance samplings as shown in Figure 7 , where the first set in Figure 7a is equivalent to the set in Figure 4b . While all three sets share the testing parameter instances, Sets 2 and 3 (Figures 7b and 7c ) are built incrementally upon Set 1 by having additional training and validating parameter instances: compared to Set 1, Set 2 has two more training parameter instances {µ (13) , µ (15) } and one more validating parameter instance µ (14) as shwon in Figure 7b and Set 3 has four more training parameter instances {µ (13) , µ (15) , µ (16) , µ (18) } and two more validating parameter instance {µ (14) , µ (17) } as shwon in Figure 7c . We again consider the same hyperparameters described in Table 1 with the maximum number of epochs as 50,000 on the training sets depicted in Figure 7 , and Table 4 reports the accuracy of approximate solutions computed on the testing parameter instances. Increasing the number of training/validating parameter instances virtually has no effect on the accuracy of the approximation measured on the testing parameter instance µ (5) . That is, for the given network architecture (Table 1) , increasing the amount of training/validating data does not have significant effect on the performance of the framework. On the other hand, increasing the number of training/validating parameter instances in a way that is shown in Figure 7 has significantly improved the accuracy of the approximations for the other testing parameter instances {µ (10) , µ (11) , µ (12) }. This set of experiments essentially illustrates that, for a given network architecture, more accurate approximation can be achieved for testing parameter instances that lie in between training/validating parameter instances (i.e., interpolation in the parameter space) than for those that lie outside of training/validating parameter instances (i.e., extrapolation in the parameter space). Table 4 : Prediction Scenario 2: the relative 2 -errors (Eq 6.1) of the approximate solutions for testing parameter instances computed using the framework with PNODE. Each framework with PNODE is trained on different training/validating sets depicted in Figure 7 . 1.0735 × 10 −2 6.5199 × 10 −3 3.1217 × 10 −3 The reaction model of a premixed H 2 -air flame at constant uniform pressure [5] is described by the equation: ∂w(x, t; µ) ∂t = ∇ · (ν∇w(x, t; µ)) − v · ∇w(x, t; µ) + q(w(x, t; µ); µ), (6.3) where, on the right-hand side, the first term is the diffusion term with the spatial gradient operator ∇, the molecular diffusivity ν = 2cm 2 ·s −1 , the second term is the convective term with the constant and divergence-free velocity field v = [50cm·s −1 , 0] T , and the third term is the reactive term with the reaction source term q. The solution w corresponds to the thermo-chemical composition vector defined as where w T denotes the temperature, and w H2 , w O2 , w H2O denote the mass fraction of the chemical species H 2 , O 2 , and H 2 O. The reaction source term is of Arrhenius type, which is defined as: q(w; µ) = [q T (w; µ), q H2 (w; µ), q O2 (w; µ), q H2O (w; µ)] T , where q T (w; µ) = Qq H2O (w; µ), , v H2O ) = (2, 1, −2) denote stoichiometric coefficients, (W H2 , W O2 , W H2O ) = (2.016, 31.9, 18) denote molecular weights in units g·mol −1 , ρ = 1.39×10 −3 g·cm −3 denotes the density mixture, R = 8.314J·mol −1 ·K −1 denotes the universal gas constant, and Q = 9800K denotes the heat of the reaction. The problem has two input parameters (i.e., n µ = 2), which correspond to µ = (µ 1 , µ 2 ) = (A, E), where A and E denote the pre-exponential factor and the activation energy. • Γ 4 , Γ 5 , and Γ 6 : the homogeneous Neumann condition, and the initial condition is set as w T = 300K, and (w H2 , w O2 , w H2O ) = (0, 0, 0) (i.e., empty of chemical species). For collecting data, we employ a finite-difference method with 64 × 32 uniform grid points (i.e., N = n µ × n 1 × n 2 = 4 × 64 × 32), the second-order backward Euler method (BDF2) with a uniform time step ∆t = 10 −4 and the final time 0.06 (i.e., n t = 600). Figure 9 depicts snapshots of the reference solutions of each species for the training parameter instance (µ 1 , µ 2 ) = (2.3375 × 10 12 , 5.6255 × 10 3 ). Each species in this problem has different numeric scales: the magnitude of w T is about four-orders of magnitude larger than those of other species (see Figure 9 ). To set the values of each species in a common range [0, 1], we employ zero-one scaling to each species separately. Moreover, because the values of A and E are several orders of magnitude larger than those of species, we scale the input parameters as well to match the scales of the chemical species. We simply divide the values of the first parameter and the second parameter by 10 13 and 10 4 , respectively. 3 After these scaling operations, we train the framework with hyper-parameters specified in Table 5 , where the input data consists of 2-dimensional data with 4 channels, and the reduced dimension is again set as p = 5. In this experiment, we vary two parameters: the pre-exponential factor µ 1 = A and the activation energy µ 2 = E. We consider parameter instances as depicted in Figure Table 6 presents the relative 2 -errors of approximate solutions computed using NODE and PNODE for testing parameter instances in the predictive scenario. The first three rows in Table 6 correspond to the results of testing parameter instances at the middle three red circles in Figure 10 . As expected, both NODE and PNODE work well for these testing parameter instances: NODE is expected to work well for these testing parameter instances because the single trajectory that minimizes the MSE over validating parameter instances would be the trajectory associated with the testing parameter µ (8) . As we consider testing parameter instances that are distant from µ (8) , we observe PNODE to be (significantly) more accurate than NODE. From these observations, the NODE model can be considered as being overfitted to a trajectory that minimizes the MSE. This overfitting can be avoided to a certain extent by applying e.g., early-stopping, however, this cannot fundamentally fix the problem of the NODE (i.e., fitting a single trajectory for all input data distributions). For the third benchmark problem, we consider the quasi-one-dimensional Euler equations for modeling inviscid compressible flow in a one-dimensional converging-diverging nozzle with a continuously varying cross-sectional area [44] . The system of the governing equations is Here, ρ denotes density, u denotes velocity, p denotes pressure, denotes energy per unit mass, e denotes total energy density, γ denotes the specific heat ratio, and A(x) denotes the converging-diverging nozzle cross-sectional area. We consider a specific heat ratio of γ = 1.3, a specific heat constant of R = 355.4m 2 /s 2 /K, a total temperature of T total = 300K, and a total pressure of p total = 10 6 N/m 2 . The cross-sectional area A(x) is determined by a cubic spline interpolation over the points Figure 11 depicts the schematic figures of converging-diverging nozzle determined by A(x), parameterized by the width of the middle cross-sectional area, µ. A perfect gas, which obeys the ideal gas law (i.e., p = ρRT ), is assumed. For the initial condition, the initial flow field is computed as follows; a zero pressure-gradient flow field is constructed via the isentropic relations, where M denotes the Mach number, c denotes the speed of sound, a subscript m indicates the flow quantity at x = 0.5 m. The shock is located at x = 0.85 m and the velocity across the shock (u 2 ) is computed by using the jump relations for a stationary shock and the perfect gas equation of state. The velocity across the shock satisfies the quadratic equation where m = ρ 2 u 2 = ρ 1 u 1 , n = ρ 2 u 2 2 + p 2 = ρ 1 u 2 1 + p 1 , h = (e 2 + p 2 )/ρ 2 = (e 1 + p 1 )/ρ 1 . The subscripts 1 and 2 indicates quantities to the left and to the right of the shock. We consider a specific Mach number of M m = 2.0. For spatial discretization, we employ a finite-volume scheme with 128 equally spaced control volumes and fully implicit boundary conditions, which leads to N = n u n 1 = 3 × 128 = 384. At each intercell face, the Roe flux difference vector splitting method is used to compute the flux. For time discretization, we employ the backward Euler scheme with a uniform time step ∆t = 10 −3 and the final time 0.6 (i.e., n t = 600). Figure 12 The varying parameter of this problem is the width of the middle cross-sectional area, which determines the geometry of the spatial domain and, thus, determines the initial condition as well as the dynamics. Analogously to the previous two benchmark problems, we select 4 training parameter instances, 3 validating parameter instances, and 3 testing parameter instances ( Figure 13 ): Again, we set the reduced dimension as p = 5. We train the framework either with NODE and PNODE for learning latent dynamics and test the framework in the predictive scenario (i.e., for unseen testing parameter instances as shown in Figure 13 ) and Figure 14 depicts the solution snapshots at t = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6}. We observe that PNODE yields moderate improvements over NODE, i.e., about 20% decrease in the relative 2 -norm of the error (6.1) for all four testing parameter instances. The improvements are not as dramatic as the ones shown in the previous two benchmark problems. We believe this is because, in this problem setting, varying the input parameter results in fairly distinct initial conditions, but does not significantly affect variations in dynamics; both the initial condition and the dynamics are parameterized by the same input parameter, the width of the middle cross-sectional area of the spatial domain. Our general observation is that the benefits of using PNODE are most pronounced when the dynamics are parameterized and there is a single initial condition although PNODE outperforms NODE in all our benchmark problems. We expect to see more improvements in the approximation accuracy over NODE when the dynamics vary significantly for different input parameters, for instance, compartmental modeling (e.g., SIR, SEIR models) of infectious diseases such as the novel corona virus (COVID-19) [1, 68] , where the dynamics of transmission is greatly affected by parameters of the model, which are determined by e.g., quarantine policy, social distancing. Other potential applications of PNODEs include modeling i) the response of a quantity of interest of a parameterized partial differential equations and ii) the errors of reduced-order model of dynamical systems [50] . Our approach shares the same limitation with other data-driven ROM approaches; it does not guarantee preservation of any important physical properties such as conservation. This is a particularly challenging issue, but there have been recent advances in deep-learning approaches for enforcing conservation laws (e.g., enforcing conservation laws in subdomains [40] , hyperbolic conservation laws [57] , Hamiltonian mechanics [25, 67] , symplectic structures [11, 32] , Lagrangian mechanics [14] and metriplectic structure [29] ) and we believe that adapting/extending ideas of these approaches potentially mitigates the limitation of data-driven ROM approaches. In this study, we proposed a parameterized extension of neural ODEs and a novel framework for reducedorder modeling of complex numerical simulations of computational physics problems. Our simple extension allows neural ODE models to learn multiple complex trajectories. This extension overcomes the main drawback of neural ODEs, namely that only a single set of dynamics are learned for the entire data distribution. We have demonstrated the effectiveness of of parameterized neural ODEs on several benchmark problems from computational fluid dynamics, and have shown that the proposed method outperforms neural ODEs. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-NA0003525. A multi-risk SIR model with optimally targeted lockdown, tech. rep Learning dynamical systems from partial observations A survey of projection-based model reduction methods for parametric dynamical systems Model Reduction and Approximation: Theory and Algorithms Projection-based model reduction for reacting flows Galerkin v. least-squares petrov-galerkin projection in nonlinear model reduction The gnat method for nonlinear model reduction: effective implementation and application to computational fluid dynamics and turbulent flows Neural optimal control for representation learning Advances in neural information processing systems Physics-informed machine learning for reduced-order modeling of nonlinear problems Symplectic recurrent neural networks Nais-net: Stable deep networks from non-autonomous differential equations Fast and accurate deep network learning by exponential linear units (ELUs) Lagrangian neural networks A family of embedded runge-kutta formulae Augmented neural ODEs Physics-informed autoencoders for Lyapunov-stable fluid flow prediction How to train your neural ODE A comprehensive deep learning-based approach to reduced order modeling of nonlinear time-dependent parametrized pdes Latent-space dynamics for reduced deformable simulation Modeling the dynamics of pde systems with physics-constrained deep auto-regressive networks Anode: Unconditionally accurate memory-efficient gradients for neural odes Deep convolutional recurrent autoencoders for learning lowdimensional feature dynamics of fluid systems Deep Learning Hamiltonian neural networks Stable architectures for deep neural networks, Inverse Problems A deep learning framework for model reduction of dynamical systems Deep learning of thermodynamics-aware reduced-order models from data Structure-preserving neural networks Turbulence, Coherent Structures, Dynamical Systems and Symmetry Analysis of a complex of statistical variables into principal components Sympnets: Intrinsic structurepreserving symplectic networks for identifying hamiltonian systems Deep variational bayes filters: Unsupervised learning of state space models from raw data Nonlinear model reduction by deep autoencoder of noise response data A non-intrusive multifidelity method for the reduced order modeling of nonlinear problems A fast and accurate physics-informed neural network reduced order model with shallow masked autoencoder Adam: A method for stochastic optimization Tensor decompositions and applications, SIAM review Deep learning Deep conservation: A latent dynamics model for exact satisfaction of physical conservation laws Model reduction of dynamical systems on nonlinear manifolds using deep convolutional autoencoders Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations Deep learning for universal linear embeddings of nonlinear dynamics Numerical computation of compressible and viscous flow Reduced-order modeling of advection-dominated systems with recurrent neural networks and convolutional autoencoders Time-series learning of latent-space dynamics for reduced-order model closure Deep dynamical modeling and control of unsteady fluid flows Reduced basis methods: Success, limitations and future challenges Linearly-recurrent autoencoder networks for learning dynamics Time-series machine-learning error models for approximate solutions to parameterized dynamical systems A deep learning enabler for nonintrusive reduced order modeling of fluid flows The mathematical theory of optimal processes Turbulence forecasting via neural ODE SNODE: Spectral discretization of neural odes for system identification Reduced Basis Methods for Partial Differential Equations: an Introduction Nonintrusive reduced order modeling framework for quasigeostrophic turbulence Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations Machine learning for nonintrusive model order reduction of the parametric inviscid transonic flow past an airfoil A trajectory piecewise-linear approach to model order reduction of nonlinear dynamical systems Depth separation for reduced deep networks in nonlinear model reduction: Distilling shock waves in nonlinear hyperbolic problems Latent odes for irregularly-sampled time series Deep neural networks motivated by partial differential equations An artificial neural network framework for reduced order modeling of transient flows Projection-based model reduction: Formulations for physics-based machine learning Learning Koopman invariant subspaces for dynamic mode decomposition Enabling nonlinear manifold projection reduced-order models by extending convolutional neural networks to unstructured data Hamiltonian generative networks Phase-adjusted estimation of the number of coronavirus disease 2019 cases in wuhan, china Non-intrusive reduced order modeling of unsteady flows using artificial neural networks with application to a combustion problem Model identification of reduced order fluid dynamics systems using deep learning A proposal on machine learning via dynamical systems Latent space physics: Towards learning the temporal evolution of fluid flow Non-intrusive inference reduced order model for fluids using deep multistep neural network, Mathematics ODE2VAE: Deep generative second order ODEs with Bayesian neural networks ANODEV2: A coupled neural ODE framework Adaptive checkpoint adjoint method for gradient estimation in neural ODE