# Parameterized neural ordinary differential equations: applications to computational physics problems

## Abstract

This work proposes an extension of neural ordinary differential equations (NODEs) by introducing an additional set of ODE input parameters to NODEs. This extension allows NODEs to learn multiple dynamics specified by the input parameter instances. Our extension is inspired by the concept of parameterized ODEs, which are widely investigated in computational science and engineering contexts, where characteristics of the governing equations vary over the input parameters. We apply the proposed parameterized NODEs (PNODEs) for learning latent dynamics of complex dynamical processes that arise in computational physics, which is an essential component for enabling rapid numerical simulations for time-critical physics applications. For this, we propose an encoder–decoder-type framework, which models latent dynamics as PNODEs. We demonstrate the effectiveness of PNODEs on benchmark problems from computational physics.

### 1. Introduction

Numerical simulations of dynamical systems described by systems of ordinary differential equations (ODEs) play essential roles in various engineering and applied science applications. Such systems of ODEs often arise from spatial discretization of time-dependent partial differential equations; examples include predicting input/output responses, design and optimization [1]. These ODEs and their solutions often depend on a set of *input parameters*, and such ODEs are denoted as *parameterized ODEs*. Examples of such input parameters within the context of fluid dynamics include Reynolds number and Mach number. In many important scenarios, *high-fidelity* solutions of parameterized ODEs are required to be computed (i) for many different input parameter instances (i.e. a *many-query* scenario) or (ii) in *real time* on a new input parameter instance. A single run of a high-fidelity simulation, however, is often computationally expensive (e.g. when the ODE comprises a semi-discretized partial differential equation with a fine spatio-temporal resolution). Consequently, performing real-time or multiple runs of a high-fidelity simulation can be computationally prohibitive.

To mitigate this computational burden, many data-driven surrogate modelling approaches have been proposed to replace costly high-fidelity simulations. The common goal of these approaches is to build a latent-dynamics model with lower complexity than that of the high-fidelity model, and to use the reduced model to compute approximate solutions for any new input parameter instance. In general, model-order reduction approaches consist of two components: (i) a low-dimensional latent-dynamics model, where the computational complexity is very low, and (ii) a (non)linear mapping that constructs high-dimensional approximate states (i.e. solutions) from the low-dimensional states obtained from the latent-dynamics model. In many studies, such models are constructed via *data-driven* techniques in the following steps: (i) collect solutions of high-fidelity simulations for a set of training parameter instances, (ii) build a parameterized surrogate model, and (iii) fit the model by training with the data collected from step (i).

In the field of deep learning, similar efforts have been made for learning latent dynamics of various physical processes [2–6]. Neural ordinary differential equations (NODEs), a method of learning time-continuous dynamics in the form of a system of ODEs from data, comprise a particularly promising approach for learning latent dynamics of dynamical systems. NODEs have been studied in [3,7–12], and this body of work has demonstrated their ability to successfully learn latent dynamics and to be applied to downstream tasks [3,5].

Because NODEs learn latent dynamics in the form of ODEs, NODEs have a naturally good fit as a latent-dynamics model in data-driven surrogate modelling of physical processes and have been applied to several computational physics problems, including turbulence modelling [13,14] and future state predictions in fluids problems [15]. As pointed out in [16,17], however, NODEs learn a single set of network weights, which fits best for a given training dataset. This results in a NODE model with limited expressibility and often leads to unnecessarily complex dynamics [18]. To overcome this shortcoming, we propose to extend NODEs to have a set of input parameters that specify the dynamics of the NODE model, which leads to parameterized NODEs (PNODEs). With this simple extension, PNODEs can represent multiple trajectories such that the dynamics of each trajectory is characterized by the input parameter instance.

The main contributions of this paper are (i) an extension to NODEs that enables them to learn multiple trajectories with a single set of network weights; even for the same initial condition, the dynamics can be different for different input parameter instances, (ii) a framework for learning latent dynamics of parameterized ODEs arising in computational physics problems, and (iii) a demonstration of the effectiveness of the proposed framework with advection-dominated benchmark problems, which are a class of problems where classical linear latent-dynamics learning methods (e.g. principal component analysis) often fail to learn accurately [19].

### 2. Neural ordinary differential equations

NODEs are a family of deep neural network models that parameterize the time-continuous dynamics of hidden states using a system of ODEs,

### 3. Parameterized neural ordinary differential equations

To resolve this, we propose a simple but powerful extension of NODEs. We refer to this extension as PNODEs,

### 4. Application to data-driven surrogate modelling

We now investigate PNODEs within the context of developing data-driven surrogate models of computational physics problems. To this end, we consider systems that can be described by a parameterized system of ODEs, which we denote by the full-order model (FOM),

For many applications, numerically simulating the system (4.1) and extracting QoIs can be computationally expensive; this is the case, for example, when the state space of the dynamical system is large. For analyses that require an ensemble of trajectories of the forward model (e.g. uncertainty quantification, optimization), directly simulating the system (4.1) can become a computational bottleneck. This motivates the development of low-cost surrogate models. Here, we use PNODE to develop data-driven surrogate models to alleviate this computational burden. To this end, we assume access to a collection of QoI observations obtained via simulation of the system (4.1) at selected training parameter instances, i.e. we assume access to the dataset

In what follows, we employ PNODE in conjunction with an encoder–decoder framework to learn surrogate models. In the framework, PNODE is employed to evolve a set of *latent variables* in time, and a decoder is then employed to map these latent variables to the QoIs. We now outline this framework.

#### (a) PNODE encoder–decoder framework

For $t\in [0,T]$, $\mathit{\mu}\in \mathcal{D}$, we propose to approximate the QoIs as

An important aspect of the framework is now emphasized. By introducing a set of latent variables, we have decoupled the dimensionality of observed variables from the dimensionality of the latent dynamics. This enables fine-grained trade-offs between computational cost and model complexity, and is critical in several circumstances. For example, if the dimensionality of the observed variables is large, we can develop a cheap surrogate model by ensuring that the latent dimension is small (i.e. $p\ll {N}_{s}$). On the other hand, if the dimensionality of the observed variables is small, as is the case when the observable is scalar-valued, we can still develop a surrogate with enough complexity to capture the underlying dynamics by setting $p>{N}_{s}$.

#### (b) Initialization of the latent states

Simulating the latent dynamics (4.5) requires specification of an initial condition, $\hat{\mathit{u}}(0,\mathit{\mu})$. Various techniques exist in the literature to achieve this, and here we employ an encoder framework. We assume access to the initial state for the training parameter instances, ${\mathcal{S}}_{0}=\{\mathit{u}(0,\mathit{\mu})|{\mathit{\mu}}_{\text{train}}\in {\mathcal{D}}_{\text{train}}\}$, and we then employ an encoder for the initial state, i.e.

#### (c) Learning framework

With all these components defined, the forward pass of the framework can be described as

(i) | encode a reduced initial state from the given initial condition: ${\hat{\mathit{u}}}^{0}({\mathit{\mu}}^{i})=h\phantom{\rule{1px}{0ex}}{h}_{\text{enc}}({\mathit{u}}^{0}({\mathit{\mu}}^{i});\theta \phantom{\rule{1px}{0ex}}{\theta}_{\text{enc}})$, | ||||

(ii) | solve a system of ODEs defined by PNODE (or NODE): $$\begin{array}{rl}& {\hat{\mathit{u}}}_{i}^{1},\dots ,{\hat{\mathit{u}}}_{i}^{{n}_{t}}=\text{ODESolve}({\hat{\mathit{u}}}^{0}({\mathit{\mu}}^{i}),{\hat{\mathit{f}}}_{\mathrm{\Theta}},{\mathit{\mu}}^{i},{t}_{1},\dots ,{t}_{{n}_{t}}),\phantom{\rule{1em}{0ex}}(\text{PNODE})\\ (\text{or},\phantom{\rule{1em}{0ex}}& {\hat{\mathit{u}}}_{i}^{1},\dots ,{\hat{\mathit{u}}}_{i}^{{n}_{t}}=\text{ODESolve}({\hat{\mathit{u}}}^{0}({\mathit{\mu}}^{i}),{\hat{\mathit{f}}}_{\mathrm{\Theta}},{t}_{1},\dots ,{t}_{{n}_{t}}),\phantom{\rule{1em}{0ex}}(\text{NODE})),\end{array}$$
| ||||

(iii) | decode a set of reduced states to a set of approximate observables: ${\stackrel{~}{\mathit{s}}}_{i}^{k}=h\phantom{\rule{1px}{0ex}}{h}_{\text{dec}}({\hat{\mathit{u}}}_{i}^{k};\theta \phantom{\rule{1px}{0ex}}{\theta}_{\text{dec}}),k=1,\dots ,{n}_{t}$, and | ||||

(iv) | compute sensitivities to loss function $L({\stackrel{~}{\mathit{s}}}_{i}^{1},\dots ,{\stackrel{~}{\mathit{s}}}_{i}^{{n}_{t}},{\mathit{s}}_{i}^{1},\dots ,{\mathit{s}}_{i}^{{n}_{t}})$ and update weights. |

Here, the subscript $i$ denotes the $i$th problem-specific parameter instances in the training set, and the final loss is computed as $\sum _{i=1}^{{n}_{\text{train}}}L({\stackrel{~}{\mathit{s}}}_{i}^{1},\dots ,{\stackrel{~}{\mathit{s}}}_{i}^{{n}_{t}},{\mathit{s}}_{i}^{1},\dots ,{\mathit{s}}_{i}^{{n}_{t}})$, where ${n}_{\text{train}}$ is the total number of parameter instances in the training set. Figure 1 illustrates the computational graph of the forward pass in the proposed framework. We emphasize that the proposed framework only takes the initial states from the training data and the problem-specific ODE parameters $\mathit{\mu}$ as an input. PNODEs can still learn multiple trajectories, which are characterized by the ODE parameters, even if the same initial states are given for different ODE parameters, which is not achievable with NODEs. Furthermore, the proposed framework is significantly simpler than the common neural network settings for NODEs when they are used to learn latent dynamics: the sequence-to-sequence architectures as in [3,5,14,21,22], which require that a (part of a) sequence is fed into the encoder network to produce a context vector, which is then fed into the NODE decoder network as an initial condition.

### 5. Numerical experiments

In the following, we apply the proposed framework for learning latent dynamics of parameterized dynamics from computational physics problems. The proposed framework is trained on solutions of parameterized ODEs for a certain set of training input parameter instances, and then is tested to predict either full states or QoIs of parameterized ODEs on an unseen test input parameter instance. We demonstrate the effectiveness of the proposed framework on four benchmark problems: the one-dimensional Burgers equation with forcing, a two-dimensional chemically reacting flow, the quasi-one-dimensional Euler equations and the two-dimensional shallow water equations.

#### (a) Data collection

For training, we collect snapshots of the QoIs by solving the FOM for pre-specified training parameter instances $\mathit{\mu}\in {\mathcal{D}}_{\text{train}}\equiv {\{{\mathit{\mu}}_{\text{train}}^{k}\}}_{k=1}^{{n}_{\text{train}}}\subset \mathcal{D}$. This collection results in a tensor

Assuming that the FOM arises from a spatially discretized partial differential equation, the total degrees of freedom $N$ can be defined as $N={n}_{u}\times {n}_{1}\times \cdots \times {n}_{{n}_{\text{dim}}}$, where ${n}_{u}$ is the number of different types of solution variables (e.g. chemical species), ${n}_{\text{dim}}$ denotes the number of spatial dimensions of the partial differential equation and ${n}_{k}$, $k=1,\dots ,{n}_{\text{dim}}$ denotes the number of degrees of freedom in each spatial dimension. Note that this spatially distributed data representation is analogous to multi-channel images (i.e. ${n}_{u}$ corresponds to the number of channels); as such we use (transposed) convolutional layers [24,25] in our encoder and decoder for state prediction. For QoI prediction, we use fully connected layers for the decoder.

#### (b) Network architectures, training and testing

In the experiments, we employ NODE and PNODE for learning latent dynamics and model ${\hat{\mathit{f}}}_{\mathrm{\Theta}}$ as a feedforward neural network with fully connected layers. For our autoencoder, we employ two different configurations depending on the QoI. For the case where the QoI is the full state (i.e. $\mathit{g}=\text{identity})$, we employ convolutional encoders and transposed/convolutional decoders. The encoder consists of four convolutional layers, followed by one fully connected layer, and the decoder consists of one fully connected layer, followed by four transposed/convolutional layers. To decrease/increase the spatial dimension, we employ strides larger than 1, but we do not use pooling layers. In the case where the QoI is scalar valued, we employ the same encoder, but in this case the decoder is a fully connected network. We employ exponential linear units [26] for all activation functions, with an exception of the output layer of the deocder (i.e. no activation at the output layer). Lastly, for $\text{ODESolve}$, we use the Dormand–Prince method [27], which is provided in the software package of [3]. Our implementation reuses many parts of the software package used in [5], which is written in

For training, we set the loss function as the mean squared error (MSE) and optimize the network weights $(\mathrm{\Theta},\theta \phantom{\rule{1px}{0ex}}{\theta}_{\text{enc}},\theta \phantom{\rule{1px}{0ex}}{\theta}_{\text{dec}})$ using Adamax, a variant of Adam [29], with an initial learning rate $1e-2$. At each epoch of training, the loss function is evaluated on the validation set, and the best performing network weights on the validation set are chosen to try the model on the test data. The details on a training/validating/testing split will be given later in the descriptions of each experiment.

For the performance evaluation metrics, we measure errors of approximated solutions with respect to the reference solutions for testing parameter instances, ${\mathit{\mu}}_{\text{test}}^{k}$. We use the relative ${\ell}^{2}$-norm of the error,

#### (c) Problem 1: one-dimensional inviscid Burgers’ equation

The first benchmark problem is a parameterized one-dimensional inviscid Burgers’ equation, which models simplified nonlinear fluid dynamics and demonstrates propagations of shocks. The governing system of partial differential equations is

For the numerical experiments in this subsection, we use the network described in table 1.

encoder | decoder | ||
---|---|---|---|

conv-layer (4 layers) | FC-layer (1 layer) | ||

$\phantom{\rule{1em}{0ex}}\kappa $ | [16, 8, 4, 4] | ${d}_{\text{in}}=p$, ${d}_{\text{out}}=128$ | |

$\phantom{\rule{1em}{0ex}}{n}_{\kappa}$ | [8, 16, 32, 64] | trans-conv-layer (4 layers) | |

$\phantom{\rule{1em}{0ex}}s$ | [2, 4, 4, 4] | $\phantom{\rule{1em}{0ex}}\kappa $ | [4, 4, 8, 16] |

FC-layer (1 layer) | $\phantom{\rule{1em}{0ex}}{n}_{\kappa}$ | [32, 16, 8, 1] | |

${d}_{\text{in}}=128$, ${d}_{\text{out}}=p$ | $\phantom{\rule{1em}{0ex}}s$ | [4, 4, 4, 2] |

##### (i) Reconstruction: approximating a single trajectory with latent-dynamics learning

In this experiment, we consider a single training/testing parameter instance ${\mathit{\mu}}_{\text{train}}^{1}={\mathit{\mu}}_{\text{test}}^{1}=({\mathit{\mu}}_{1}^{1},{\mathit{\mu}}_{2}^{1})=(4.25,0.015)$ and test both NODE and PNODE for latent-dynamics modelling. We set the reduced dimension^{1} as $p=5$ and the maximum number of epochs as 20 000. The relative errors (equation (5.1)) for NODE and PNODE are $2.6648\times {10}^{-3}$ and $2.6788\times {10}^{-3}$, respectively; the differences between the two errors are negligible.

##### (ii) Prediction: approximating an unseen trajectory for unseen parameter instances

We now consider two multi-parameter scenarios. In the first scenario (scenario 1), we vary the first parameter (boundary condition) and consider four training parameter instances, two validation parameter instances and two test parameter instances. The parameter instances are collected as shown in figure 3*a*: the training parameter instances correspond to ${\mathcal{D}}_{\text{train}}=\{(4.25+0.139k,0.015)\},k=0,2,4,6$, the validating parameter instances correspond to ${\mathcal{D}}_{\text{val}}=\{(4.25+0.139k,0.015)\},k=1,5$ and the testing parameter instances correspond to ${\mathcal{D}}_{\text{test}}=\{(4.67,0.015),(5.22,0.015)\}$. Note that the initial condition is identical for all parameter instances, i.e. ${\mathit{u}}^{0}(\mathit{\mu})=1$.

We train the framework with NODE and PNODE with the same set of hyperparameters with the maximum number of epochs as 50 000. Again, the reduced dimension is set to $p=5$. Figure 4*a*,*b* depicts snapshots of reference solutions and approximated solutions using NODE and PNODE. Both NODE and PNODE learn the boundary condition (i.e. 4.67 at $x=0$) accurately. For NODE, this is only because the testing boundary condition is linearly in the middle of two validating boundary conditions (and also in the middle of four training boundary conditions) and minimizing the MSE results in learning a single trajectory with NODE, where the trajectory has a boundary condition, which is exactly the middle of two validating boundary conditions, $4.389$ and $4.944$. Moreover, as NODE learns a single trajectory that minimizes the MSE, it actually fails to learn the correct dynamics and results in poor approximate solutions as time proceeds. As opposed to NODE, PNODE accurately approximates solutions up to the final time. Table 2 (second column) shows the relative ${\ell}^{2}$-errors (equation (5.1)) for both NODE and PNODE; PNODE is seen to yield significantly better predictions than NODE.

${\mathit{\mu}}_{\text{test}}^{1}={\mathit{\mu}}^{(4)}$ | ${\mathit{\mu}}_{\text{test}}^{2}={\mathit{\mu}}^{(8)}$ | |
---|---|---|

NODE | 43.057 | 157.40 |

PNODE | 3.6547 | 5.6900 |

Continuing from the previous experiment, we test the second testing parameter instance, ${\mathcal{D}}_{\text{test}}=\{(5.22,0.015)\}$, which is located outside ${\mathcal{D}}_{\text{train}}$ (i.e. next to ${\mathit{\mu}}^{(7)}$ in figure 3*a*). The results are shown in figure 4*c*,*d*: NODE only learns a single trajectory with the boundary condition, which lies in the middle of validating parameter instances, whereas PNODE accurately produces approximate solutions for the new testing parameter instances. Table 2 (third column) reports the relative errors, where it is again seen that PNODE yields significant improvements over NODE.

Next, in the second scenario (scenario 2), we vary both parameters ${\mu}_{1}$ and ${\mu}_{2}$ as shown in figure 3*b*: the training, validating and testing parameter instances correspond to

We have tested the set of testing parameter instances and table 3 reports the relative errors; the result shows that PNODE achieves less than 1% error in most cases. On the other hand, NODE achieves around 10% errors in most cases. The 1.7% error of NODE for ${\mathit{\mu}}_{\text{test}}^{1}$ is achieved only because the testing parameter instance is located in the middle of the validating parameter instances (and the training parameter instances).

${\mathit{\mu}}_{\text{test}}^{1}={\mathit{\mu}}^{(5)}$ | ${\mathit{\mu}}_{\text{test}}^{2}={\mathit{\mu}}^{(10)}$ | ${\mathit{\mu}}_{\text{test}}^{3}={\mathit{\mu}}^{(11)}$ | ${\mathit{\mu}}_{\text{test}}^{4}={\mathit{\mu}}^{(12)}$ | |
---|---|---|---|---|

NODE | 17.422 | 107.13 | 89.229 | 123.77 |

PNODE | 3.2672 | 7.7303 | 8.5650 | 10.735 |

*Study on effective latent dimension.* We have further tested the framework with NODE and PNODE for varying latent dimensions $p=\{1,2,3,4,5\}$, with the same hyperparameters as described in table 1, the maximum number of epochs as 50 000 and the same training/validating/testing split as shown in figure 3*b*. Figure 5 reports the experimental results. For all testing parameter instances, the dimension of latent states marginally affects the performance of NODEs. We believe that this is because NODE learns a dynamics that minimizes the MSE over four validating parameter instances in ${\mathcal{D}}_{\text{val}}$ regardless of the latent dimensions. On the other hand, decreasing the latent dimension to be smaller than 3 ($p<3$) negatively affects the performances of the PNODEs for all testing parameter instances. Nevertheless, even with the latent dimension 1, $p=1$, PNODE still outperforms NODE in all testing parameter instances; with $p=2$, PNODE starts to produce almost an order of magnitude more accurate approximate solutions than NODE does. Moreover, we observe that, for the given training data/training strategy and the hyperparameters, increasing the latent dimension (larger than $p=2$) only marginally improves the accuracy of the solutions, which agrees with the observations made in [31] and, in some cases, a neural network with larger $p$ (i.e. $p>5$) requires more epochs to reach the same training error achieved by a neural network with smaller $p$ (i.e. $p=\{3,4,5\}$).

*Study on varying training/validating datasets.* We now assess the performance of the proposed framework for different settings of training and validating datasets. This experiment illustrates the dependence of the framework on the amount of training data as well as settings of training/validation/testing splits. To this end, we have trained and tested the framework with PNODE on three sets of input parameter instance samplings, as shown in figure 6, where the first set in figure 6*a* is equivalent to the set in figure 3*b*. While all three sets share the testing parameter instances, sets 2 and 3 (figure 6*b*,*c*) are built incrementally upon set 1 by having additional training and validating parameter instances. Compared with set 1, set 2 has two more training parameter instances $\{{\mathit{\mu}}^{(13)},{\mathit{\mu}}^{(15)}\}$ and one more validating parameter instance ${\mathit{\mu}}^{(14)}$, as shown in figure 6*b*, and set 3 has four more training parameter instances $\{{\mathit{\mu}}^{(13)},{\mathit{\mu}}^{(15)},{\mathit{\mu}}^{(16)},{\mathit{\mu}}^{(18)}\}$ and two more validating parameter instances $\{{\mathit{\mu}}^{(14)},{\mathit{\mu}}^{(17)}\}$, as shown in figure 6*c*.

We again train the framework with PNODE on the training sets depicted in figure 6. We employ the same hyperparameters as described in table 1, and the maximum number of epochs is set at 50 000. Table 4 reports the accuracy of approximate solutions computed on the testing parameter instances. Increasing the number of training/validating parameter instances has virtually no effect on the accuracy of the approximation measured on the testing parameter instance ${\mathit{\mu}}^{(5)}$. That is, for the given network architecture (table 1), increasing the amount of training/validating data does not significantly affect the accuracy of the approximation for the testing parameter instance ${\mathit{\mu}}^{(5)}$. On the other hand, increasing the number of training/validating parameter instances in the way shown in figure 6 has significantly improved the accuracy of the approximations for the other testing parameter instances $\{{\mathit{\mu}}^{(10)},{\mathit{\mu}}^{(11)},{\mathit{\mu}}^{(12)}\}$. This set of experiments essentially illustrates that more accurate approximation can be achieved for testing parameter instances that lie in between training/validating parameter instances (i.e. testing parameter instances within the dotted boxes in figure 6) than for those that lie outside of training/validating parameter instances (i.e. testing parameter instances outside the dotted boxes in figure 6).

${\mathit{\mu}}_{\text{test}}^{1}={\mathit{\mu}}^{(5)}$ | ${\mathit{\mu}}_{\text{test}}^{2}={\mathit{\mu}}^{(10)}$ | ${\mathit{\mu}}_{\text{test}}^{3}={\mathit{\mu}}^{(11)}$ | ${\mathit{\mu}}_{\text{test}}^{4}={\mathit{\mu}}^{(12)}$ | |
---|---|---|---|---|

PNODE—set 1 | 3.2672 | 7.7303 | 8.5650 | 10.735 |

PNODE—set 2 | 3.3368 | 6.6086 | 3.7136 | 6.5199 |

PNODE—set 3 | 3.3348 | 3.4409 | 3.1278 | 3.1217 |

#### (d) Problem 2: two-dimensional chemically reacting flow

The next example considers the reaction of a premixed ${H}_{2}$–air flame at constant uniform pressure [32], which is described by the set of partial differential equations

Figure 7 depicts the geometry of the spatial domain and the boundary conditions are set as:

— | ${\Gamma}_{2}$: the inflow boundary with Dirichlet boundary conditions ${w}_{\text{T}}=950\hspace{0.17em}\text{K}$ and $({w}_{{\text{H}}_{2}},{w}_{{\text{O}}_{2}},{w}_{{\text{H}}_{2}\text{O}})=(0.0282,0.2259,0)$, | ||||

— | ${\Gamma}_{1}$ and ${\Gamma}_{3}$: the Dirichlet boundary conditions ${w}_{\text{T}}=300\hspace{0.17em}\text{K}$ and $({w}_{{\text{H}}_{2}},{w}_{{\text{O}}_{2}},{w}_{{\text{H}}_{2}\text{O}})=(0,0,0)$, | ||||

— | ${\Gamma}_{4},{\Gamma}_{5}$ and ${\Gamma}_{6}$: the homogeneous Neumann condition, |

and the initial condition is set as ${w}_{\text{T}}=300\hspace{0.17em}\text{K}$ and $({w}_{{\text{H}}_{2}},{w}_{{\text{O}}_{2}},{w}_{{\text{H}}_{2}\text{O}})=(0,0,0)$ (i.e. empty of chemical species). To collect data, we employ a finite-difference method with $64\times 32$ uniform grid points (i.e. $N={n}_{\mu}\times {n}_{1}\times {n}_{2}=4\times 64\times 32$), the second-order backward Euler method with a uniform time step $\mathrm{\Delta}t={10}^{-4}$ and the final time $T=0.06$ (i.e. ${n}_{t}=600$). Figure 8 depicts snapshots of the reference solutions of each species for the training parameter instance $({\mu}_{1},{\mu}_{2})=(2.3375\times {10}^{12},5.6255\times {10}^{3})$.

##### (i) Data preprocessing and training

Each species in this problem has different numeric scales: the magnitude of ${w}_{\text{T}}$ is about four orders of magnitude larger than those of other species (figure 8). To set the values of each species in a common range $[0,1]$, we employ 0–1 scaling to each species separately. Moreover, because the values of $A$ and $E$ are several orders of magnitude larger than those of the species, we scale the input parameters as well to match the scales of the chemical species. We simply divide the values of the first parameter and the second parameter by ${10}^{13}$ and ${10}^{4}$, respectively.^{2}

2We have not investigated different scaling strategies for scaling of parameter instances as it is not the focus of this study.

##### (ii) Prediction: approximating an unseen trajectory for unseen parameter instances

In this experiment, we vary two parameters: the pre-exponential factor ${\mu}_{1}=A$ and the activation energy ${\mu}_{2}=E$. We consider parameter instances as depicted in figure 9: the sets of the training, validating and testing parameter instances correspond to

Given the training and validating parameter instances, we train the framework with the hyper-parameters specified in table 5, where the input data consist of two-dimensional data with four channels and the reduced dimension is again set as $p=5$.

encoder | decoder | ||
---|---|---|---|

conv-layer (4 layers) | FC-layer (1 layer) | ||

$\phantom{\rule{1em}{0ex}}\kappa $ | [16, 8, 4, 4] | ${d}_{\mathrm{in}}=p$, ${d}_{\mathrm{out}}=512$ | |

$\phantom{\rule{1em}{0ex}}{n}_{\kappa}$ | [8, 16, 32, 64] | trans-conv-layer (4 layers) | |

$\phantom{\rule{1em}{0ex}}s$ | [2, 2, 2, 2] | $\phantom{\rule{1em}{0ex}}\kappa $ | [4, 4, 8, 16] |

FC-layer (1 layer) | $\phantom{\rule{1em}{0ex}}{n}_{\kappa}$ | [32, 16, 8, 4] | |

${d}_{\mathrm{in}}=512$, ${d}_{\mathrm{out}}=p$ | $\phantom{\rule{1em}{0ex}}s$ | [2, 2, 2, 2] |

Table 6 presents the relative ${\ell}^{2}$-errors of approximate solutions computed using NODE and PNODE for testing parameter instances in the predictive scenario. The first three rows in table 6 correspond to the results of testing parameter instances at the middle three red circles in figure 9. As expected, both NODE and PNODE work well for these testing parameter instances: NODE is expected to work well for these testing parameter instances because the single trajectory that minimizes the MSE over validating parameter instances would be the trajectory associated with the testing parameter ${\mathit{\mu}}^{(8)}$. As we consider testing parameter instances that are distant from ${\mathit{\mu}}^{(8)}$, we observe PNODE to be (significantly) more accurate than NODE. From these observations, the NODE model can be considered as being overfitted to a trajectory that minimizes the MSE. This overfitting can be avoided to a certain extent by applying, for example, early stopping; however, this cannot fundamentally fix the problem of NODE (i.e. fitting a single trajectory for all input data distributions).

${\mathit{\mu}}_{\text{test}}^{1}={\mathit{\mu}}^{(7)}$ | ${\mathit{\mu}}_{\text{test}}^{2}={\mathit{\mu}}^{(8)}$ | ${\mathit{\mu}}_{\text{test}}^{3}={\mathit{\mu}}^{(9)}$ | ${\mathit{\mu}}_{\text{test}}^{4}={\mathit{\mu}}^{(4)}$ | |
---|---|---|---|---|

NODE | 9.2823 | 3.3450 | 4.1516 | 40.835 |

PNODE | 4.2993 | 4.6429 | 5.0617 | 5.6011 |

${\mathit{\mu}}_{\text{test}}^{5}={\mathit{\mu}}^{(12)}$ | ${\mathit{\mu}}_{\text{test}}^{6}={\mathit{\mu}}^{(16)}$ | ${\mathit{\mu}}_{\text{test}}^{7}={\mathit{\mu}}^{(17)}$ | ${\mathit{\mu}}_{\text{test}}^{8}={\mathit{\mu}}^{(18)}$ | |

NODE | 34.767 | 59.410 | 54.553 | 74.881 |

PNODE | 4.4133 | 12.935 | 11.785 | 24.660 |

##### (iii) Prediction: approximating a trajectory of a quantity of interest for unseen parameter instances

In this experiment, we design the decoder to produce a scalar-valued QoI at each time instance, instead of reconstructing the entire states, i.e. $h\phantom{\rule{1px}{0ex}}{h}_{\text{dec}}:{\mathbb{R}}^{p}\to {\mathbb{R}}^{1}$. We consider two QoIs: the temperature and the mass fraction of ${\text{H}}_{2}\text{O}$ in the middle of the spatial domain, ${\mathit{x}}_{\text{QoI}}=(9\hspace{0.17em}\text{mm},4.5\hspace{0.17em}\text{mm})$. We again employ the same setting of the parameter instances as depicted in figure 9. For the network architecture, we consider the same encoder architecture as shown in table 5, which takes an initial state as an input and produces an initial latent state, and, as a decoder, we consider a dense neural network with four fully connected layers with $\{64,32,16,1\}$ neurons in each layer. The reduced dimension is set as $p=5$. For each QoI, temperature and the mass fraction of ${\text{H}}_{2}\text{O}$, we train two separate frameworks, whose outputs attempt to match each QoI.

Figure 10 reports the reference trajectories (solid red lines) of two QoIs: temperature and the mass fraction of ${\text{H}}_{2}\text{O}$ evaluated at the middle of the spatial domain over time for four unseen parameter instances $\{{\mathit{\mu}}^{(4)},{\mathit{\mu}}^{(8)},{\mathit{\mu}}^{(12)},{\mathit{\mu}}^{(18)}\}$ (see figure 9 for a graphical illustration of this choice). The approximated QoIs generated by PNODE and NODE are depicted by dashed blue lines and green dash-dotted lines, respectively. As observed in previous experiments, NODE only learns a single trajectory, which minimizes the MSE loss given the training and validating data; the learned trajectory almost matches with the trajectory associated with ${\mathit{\mu}}^{(8)}$ (i.e. all four green lines match exactly). This is because ${\mathit{\mu}}^{(8)}$ is located in the middle of the training and the validating parameter instances. On the other hand, PNODE produces accurate approximations to both QoIs for multiple test parameter instances. Table 7 reports the relative ${\ell}^{2}$-errors of the approximated QoIs computed using NODE and PNODE and essentially provides the same observations as in figure 10.

${\mathit{\mu}}^{(4)}$ | ${\mathit{\mu}}^{(8)}$ | ${\mathit{\mu}}^{(12)}$ | ${\mathit{\mu}}^{(18)}$ | ||
---|---|---|---|---|---|

temperature | NODE | 38.930 | 2.4626 | 31.943 | 68.314 |

PNODE | 2.8002 | 1.9484 | 2.4375 | 22.152 | |

${\text{H}}_{2}\text{O}$ | NODE | 74.765 | 4.4154 | 57.159 | 119.06 |

PNODE | 5.4625 | 3.3837 | 4.9756 | 41.195 |

#### (e) Problem 3: quasi-one-dimensional Euler equation

For the third benchmark problem, we consider the quasi-one-dimensional Euler equations for modelling inviscid compressible flow in a one-dimensional converging–diverging nozzle with a continuously varying cross-sectional area [33]. The system of the governing equations is

For spatial discretization, we employ a finite-volume scheme with 128 equally spaced control volumes and fully implicit boundary conditions, which leads to $N={n}_{u}{n}_{1}=3\times 128=384$. At each intercell face, the Roe flux difference vector-splitting method [35] is used to compute the flux. For time discretization, we employ the backward Euler scheme with a uniform time step $\mathrm{\Delta}t={10}^{-3}$ and the final time 0.6 (i.e. ${n}_{t}=600$). Figure 12 depicts the snapshots of reference solutions of Mach number $M(x)$ for the middle cross-sectional area $\mu =0.15$ at $t=\{0.1,0.2,0.3,0.4,0.5,0.6\}$.

The varying parameter of the problem is the width of the middle cross-sectional area, which determines the geometry of the spatial domain and, thus, determines the initial condition as well as the dynamics. Analogously to the previous two problems, we select four training parameter instances, three validating parameter instances and three testing parameter instances (figure 13),

With the neural network architecture specified in table 8, we train the framework either with NODE or with PNODE for learning latent dynamics and test the framework in the predictive scenario (i.e. for unseen testing parameter instances as shown in figure 13). Figure 14 depicts the solution snapshots at $t=\{0.1,0.23,0.46,0.6\}$. We observe that PNODE yields moderate improvements over NODE, i.e. about a 20% decrease in the relative ${\ell}^{2}$-norm of the error (5.1) for all four testing parameter instances (table 9). The improvements are not as dramatic as the ones shown in the previous two benchmark problems. We believe this is because, in this problem setting, varying the input parameter results in fairly distinct initial conditions, but does not significantly affect variations in dynamics; both the initial condition and the dynamics are parameterized by the same input parameter, i.e. the width of the middle cross-sectional area of the spatial domain.

encoder |
decoder | ||
---|---|---|---|

conv-layer (5 layers) | FC-layer (1 layer) | ||

$\phantom{\rule{1em}{0ex}}\kappa $ | [16, 8, 4, 4, 4] | ${d}_{\mathrm{in}}=p$, ${d}_{\mathrm{out}}=512$ | |

$\phantom{\rule{1em}{0ex}}{n}_{\kappa}$ | [16, 32, 64, 64, 128] | trans-conv-layer (5 layers) | |

$\phantom{\rule{1em}{0ex}}s$ | [2,2, 2, 2, 2] | $\phantom{\rule{1em}{0ex}}\kappa $ | [4, 4, 4, 8, 16] |

FC-layer (1 layer) | $\phantom{\rule{1em}{0ex}}{n}_{\kappa}$ | [64, 64, 32, 16, 3] | |

${d}_{\mathrm{in}}=512$, ${d}_{\mathrm{out}}=p$ | $\phantom{\rule{1em}{0ex}}s$ | [2, 2, 2, 2, 2] |

${\mathit{\mu}}^{(3)}$ | ${\mathit{\mu}}^{(6)}$ | ${\mathit{\mu}}^{(9)}$ | ${\mathit{\mu}}^{(11)}$ | |
---|---|---|---|---|

NODE | 5.1190 | 5.2753 | 5.9479 | 7.6278 |

PNODE | 3.9600 | 4.1280 | 4.6102 | 6.1570 |

#### (f) Problem 4: parameterized shallow water equations

Lastly, we consider the parameterized shallow water equations with Coriolis forcing, given as

We again design the decoder to produce a scalar-valued QoI at each time instance, instead of reconstructing the entire states, i.e. $h\phantom{\rule{1px}{0ex}}{h}_{\text{dec}}:{\mathbb{R}}^{p}\to {\mathbb{R}}^{1}$. The QoI is the height of the water surface in the middle of the spatial domain. Table 10 describes details of the network architecture: the encoder takes an initial state as an input and produces an initial latent state, and the decoder maps each computed latent state to QoIs at pre-specified time instances. The reduced dimension is set as $p=20$. Figure 15 depicts the setting of the parameter instances:

encoder |
decoder | ||
---|---|---|---|

conv-layer (5 layers) | FC-layer (5 layers) | ||

$\phantom{\rule{1em}{0ex}}\kappa $ | [16, 8, 4, 4, 4] | FC 1 | ${d}_{\mathrm{in}}=p$, ${d}_{\mathrm{out}}=128$ |

$\phantom{\rule{1em}{0ex}}{n}_{\kappa}$ | [8, 16, 32, 64, 64] | FC 2 | ${d}_{\mathrm{in}}=128$, ${d}_{\mathrm{out}}=64$ |

$\phantom{\rule{1em}{0ex}}s$ | [2, 2, 2, 2, 2] | FC 3 | ${d}_{\mathrm{in}}=64$, ${d}_{\mathrm{out}}=32$ |

FC-layer (1 layer) | FC 4 | ${d}_{\mathrm{in}}=32$, ${d}_{\mathrm{out}}=16$ | |

${d}_{\mathrm{in}}=1024$, ${d}_{\mathrm{out}}=p$ | FC 5 | ${d}_{\mathrm{in}}=16$, ${d}_{\mathrm{out}}=1$ |

Figure 16 depicts the height profiles over time and table 11 reports the relative ${\ell}^{2}$-errors for eight test parameter instances. Figure 16 and table 11 again confirm that the PNODE outperforms NODE by producing accurate parameter-dependent height profiles (i.e. more than an order of magnitude more accurate as measured by relative ${\ell}^{2}$-errors), whereas NODE again learns a single dynamics and fails to capture varying dynamics of input parameter-dependent height profiles.

${\mathit{\mu}}_{\text{test}}^{1}={\mathit{\mu}}^{(8)}$ | ${\mathit{\mu}}_{\text{test}}^{2}={\mathit{\mu}}^{(9)}$ | ${\mathit{\mu}}_{\text{test}}^{3}={\mathit{\mu}}^{(14)}$ | ${\mathit{\mu}}_{\text{test}}^{4}={\mathit{\mu}}^{(18)}$ | |
---|---|---|---|---|

NODE | 58.244 | 67.960 | 68.005 | 77.938 |

PNODE | 0.97915 | 1.2295 | 1.3399 | 2.4462 |

${\mathit{\mu}}_{\text{test}}^{5}={\mathit{\mu}}^{(19)}$ | ${\mathit{\mu}}_{\text{test}}^{6}={\mathit{\mu}}^{(27)}$ | ${\mathit{\mu}}_{\text{test}}^{7}={\mathit{\mu}}^{(28)}$ | ${\mathit{\mu}}_{\text{test}}^{8}={\mathit{\mu}}^{(29)}$ | |

NODE | 59.226 | 106.70 | 71.944 | 86.650 |

PNODE | 1.0573 | 3.0140 | 3.8343 | 2.2901 |

#### (g) Discussion

Our general observation is that the benefits of using PNODE are more pronounced when the dynamics is parameterized (as we have observed in the results of the numerical experiments). From this, we expect to see more improvements in the approximation accuracy over NODE when the dynamics varies significantly for different input parameters; for instance, compartmental modelling (e.g. susceptible–infectious–recovered (SIR) and susceptible–exposed–infectious–removed (SEIR) models) of infectious diseases such as the novel coronavirus disease 2019 (COVID-19) [36,37], where the dynamics of transmission is greatly affected by parameters of the model, which are determined by, for example, quarantine policy and social distancing. For the effect of the initial conditions on the solutions of SIR models, we refer readers to [38,39], where the analytical solutions of SIR models are derived. Other potential applications of PNODEs include modelling the errors of surrogate models of dynamical systems [40].

Our approach shares the same limitation with other data-driven surrogate modelling approaches: it does not guarantee preservation of any important physical properties such as conservation. This is a particularly challenging issue, but there have been recent advances in deep learning approaches for enforcing conservation laws (e.g. enforcing conservation laws in subdomains [19], hyperbolic conservation laws [41], Hamiltonian mechanics [42,43], symplectic structures [44,45], Lagrangian mechanics [46] and metriplectic structures [47]) and we believe that adapting/extending ideas of these approaches potentially mitigates the limitation of data-driven surrogate modelling approaches.

### 6. Related work

#### (a) Classical data-driven surrogate modelling

Data-driven surrogate modelling approaches have been extensively used and widely studied in computational science and engineering contexts. One of the most promising approaches is reduced-order modelling (ROM) techniques, which rely heavily on linear methods such as the proper orthogonal decomposition (POD) [48], which is analogous to principal component analysis [49], for constructing the mappings between a high-dimensional space and a low-dimensional space. These ROMs then identify the latent-dynamics model by executing a (linear) projection process on the high-dimensional equations, e.g. Galerkin projection [48] or least-squares Petrov–Galerkin projection [50,51]. We refer readers to [52–54] for a complete survey on classical methods.

#### (b) Physics-aware deep-learning-based surrogate modelling

Recent work has extended classical ROMs by replacing POD with nonlinear dimension reduction techniques emerging from deep learning [6,19,31,55–58]. These approaches operate by identifying a nonlinear mapping (via, for example, convolutional autoencoders) and subsequently identifying the latent dynamics as certain residual minimization problems [6,19,31,57,58], which are defined on the latent space and are derived from the governing equations. In [55,56], the latent dynamics is identified by simply projecting the governing equation using the encoder, which may lead to kinematically inconsistent dynamics.

Another class of physics-aware surrogate modelling methods include explicitly modelling time integration schemes [59–62], adaptive basis selection [63] and adding stability/structure-preserving constraints in the latent dynamics [64,65]. We emphasize that our approach is closely related to [60], where neural networks are trained to approximate the action of the first-order time-integration scheme applied to latent dynamics and, at each time step, a neural network takes a set of problem-specific parameters as well as the reduced state as an input. Thus, our approach can be seen as a time-continuous generalization of the approach in [60].

#### (c) Purely data-driven deep-learning-based surrogate modelling

Another approach for developing deep-learning-based surrogate models is to learn both nonlinear mappings and latent dynamics in purely data-driven ways. Latent dynamics is modelled as recurrent neural networks with long short-term memory (LSTM) units along with linear POD mappings [14,66,67] or nonlinear mappings constructed via (convolutional) autoencoders [22,68–70]. In [13,14], latent dynamics and nonlinear mappings are modelled as NODEs and autoencoders, respectively; in [4,71–73], autoencoders are used to learn approximate invariant subspaces of the Koopman operator. Relatedly, there have been studies on learning direct mappings via, for example, a neural network, from parameters (including time parameters) to either latent states or approximate solution states [74–78], where the latent states are computed by using autoencoders or linear POD.

#### (d) Enhancing NODE

Augmented NODEs [16] extends NODEs by augmenting additional state variables to hidden state variables, which allows NODEs to learn dynamics using the additional dimensions and, consequently, to have increased expressibility. ANODE [12] discretizes the integration range into a fixed number of steps (i.e. checkpoints) to mitigate numerical instability in the backward pass of NODEs; ACA [79] further extends this approach by adopting an adaptive stepsize solver in the backward pass. ANODEV2 [80] proposes a coupled system of NODEs, where both hidden state variables and network weights are allowed to evolve over time and their dynamics is approximated as neural networks. Neural optimal control [17] formulates NODEs as controlled dynamical systems and infers optimal control via an encoder network. This formulation results in NODEs that adjust the dynamics for different input data. Moreover, improved training strategies for NODEs have been studied in [18] and an extension of using spectral elements in discretizations of NODE has been proposed in [81].

### 7. Conclusion

In this study, we proposed a parameterized extension of NODEs and a novel framework for data-driven surrogate modelling of complex numerical simulations of computational physics problems. Our simple extension allows NODE models to learn multiple complex trajectories. This extension overcomes the main drawback of neural ODEs, namely that only a single set of dynamics is learned for the entire data distribution. We have demonstrated the effectiveness of parameterized NODEs on several benchmark problems from computational fluid dynamics, and have shown that the proposed method outperforms NODEs.

### Data accessibility

The code has been provided in the electronic supplementary material [82]. Some large files will be available upon request.

### Authors' contributions

K.L. participated in the design of the idea, performed the numerical experiments and drafted the manuscript. E.J.P. participated in the design of the idea and critically revised the manuscript. Both authors reviewed and gave final approval to the manuscript.

### Competing interests

We declare we have no competing interests.

### Funding

E.J.P. was funded under Sandia’s Advanced Simulation and Computing (ASC) Verification and Validation (V&V) Project/task no. 103723/05.30.02.

### Disclaimer

This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract no. DE-NA0003525.

## Appendix A. Quasi-one-dimensional Euler equation: the initial condition

For the initial condition, the initial flow field is computed as follows. A zero pressure-gradient flow field is constructed via the isentropic relations

## Footnotes

1 For setting $p$, we follow the results of the study on the effective latent dimension shown in [19].

### References

- 1.
Quarteroni A, Manzoni A, Negri F . 2015**Reduced basis methods for partial differential equations: an introduction**, vol.. Berlin, Germany: Springer. Google Scholar**92** - 2.
Karl M, Soelch M, Bayer J, van der Smagt P . 2017 Deep variational Bayes filters: unsupervised learning of state space models from raw data. In*Proc. 5th Int. Conf. on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017*. La Jolla, CA: International Conference on Representation Learning. Google Scholar - 3.
Chen TQ, Rubanova Y, Bettencourt J, Duvenaud D . 2018 Neural ordinary differential equations. In*Proc. Advances in Neural Information Processing Systems 31: Annu. Conf. on Neural Information Processing Systems 2018, NeurIPS 2018, Montréal, Canada, 3–8 December 2018*(eds S Bengio, HM Wallach, H Larochelle, K Grauman, N Cesa-Bianchi, R Garnett), pp. 6572–6583. Neural Information Processing Systems Foundation. Google Scholar - 4.
Morton J, Jameson A, Kochenderfer MJ, Witherden FD . 2018 Deep dynamical modeling and control of unsteady fluid flows. In*Proc. Advances in Neural Information Processing Systems 31: Annu. Conf. on Neural Information Processing Systems 2018, NeurIPS 2018, Montréal, Canada, 3–8 December 2018*(eds S Bengio, HM Wallach, H Larochelle, K Grauman, N Cesa-Bianchi, R Garnett), pp. 9278–9288. Neural Information Processing Systems Foundation. Google Scholar - 5.
Rubanova Y, Chen TQ, Duvenaud D . 2019 Latent ordinary differential equations for irregularly-sampled time series. In*Proc. Advances in Neural Information Processing Systems 32: Annu. Conf. on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, British Columbia, Canada, 8–14 December 2019*(eds HM Wallach, H Larochelle, A Beygelzimer, Fd’Alché-Buc, EB Fox, R Garnett), pp. 5321–5331. Neural Information Processing Systems Foundation. Google Scholar - 6.
Fulton L, Modi V, Duvenaud D, Levin DIW, Jacobson A . 2019 Latent-space dynamics for reduced deformable simulation.**Comput. Graph. Forum**, 379-391. (doi:10.1111/cgf.13645) Crossref, Web of Science, Google Scholar**38** - 7.
Weinan E . 2017 A proposal on machine learning via dynamical systems.**Commun. Math. Stat.**, 1-11. (doi:10.1007/s40304-017-0103-z) Crossref, Web of Science, Google Scholar**5** - 8.
Haber E, Ruthotto L . 2017 Stable architectures for deep neural networks.**Inverse Prob.**, 014004. (doi:10.1088/1361-6420/aa9a90) Crossref, Web of Science, Google Scholar**34** - 9.
Ruthotto L, Haber E . 2020 Deep neural networks motivated by partial differential equations.**J. Math. Imaging Vis.**, 352-364. (doi:10.1007/s10851-019-00903-1) Crossref, Web of Science, Google Scholar**62** - 10.
Lu Y, Zhong A, Li Q, Dong B . 2018 Beyond finite layer neural networks: bridging deep architectures and numerical differential equations. In*Proc. of the 35th Int. Conf. on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018*,*Proceedings of Machine Learning Research*, vol. 80 (eds JG Dy, A Krause), pp. 3282–3291. PMLR. Google Scholar - 11.
Ciccone M, Gallieri M, Masci J, Osendorfer C, Gomez FJ . 2018 NAIS-net: stable deep networks from non-autonomous differential equations. In*Proc. Advances in Neural Information Processing Systems 31: Annu. Conf. on Neural Information Processing Systems 2018, NeurIPS 2018, Montréal, Canada, 3–8 December 2018*(eds S Bengio, HM Wallach, H Larochelle, K Grauman, NCesa-Bianchi, R Garnett), pp. 3029–3039. Neural Information Processing Systems Foundation. Google Scholar - 12.
Gholaminejad A, Keutzer K, Biros G . 2019 ANODE: unconditionally accurate memory-efficient gradients for neural ODEs. In*Proc. of the 28th Int. Joint Conf. on Artificial Intelligence, IJCAI 2019, Macao, China, 10–16 August 2019*(ed. S Kraus), pp. 730–736. International Joint Conferences on Artificial Intelligence. Google Scholar - 13.
Portwood GD *et al.*2019 Turbulence forecasting via neural ODE. (http://arxiv.org/abs/1911.05180) Google Scholar - 14.
Maulik R, Mohan A, Lusch B, Madireddy S, Balaprakash P, Livescu D . 2020 Time-series learning of latent-space dynamics for reduced-order model closure.**Physica D**, 132368. (doi:10.1016/j.physd.2020.132368) Crossref, Web of Science, Google Scholar**405** - 15.
Ayed I, de Bézenac E, Pajot A, Brajard J, Gallinari P . 2019 Learning dynamical systems from partial observations. (http://arxiv.org/abs/1902.11136) Google Scholar - 16.
Dupont E, Doucet A, Teh YW . 2019 Augmented neural ODEs. In*Proc. Advances in Neural Information Processing Systems 32: Annu. Conf. on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, British Columbia, Canada, 8–14 December 2019*(eds HM Wallach, H Larochelle, A Beygelzimer, Fd’Alché-Buc, EB Fox, R Garnetts), pp. 3134–3144. Neural Information Processing Systems Foundation. Google Scholar - 17.
Chalvidal M, Ricci M, VanRullen R, Serre T . 2020 Neural optimal control for representation learning. (http://arxiv.org/abs/2006.09545) Google Scholar - 18.
Finlay C, Jacobsen J, Nurbekyan L, Oberman AM . 2020 How to train your neural ODE: the world of Jacobian and kinetic regularization. In*Proc. of the 37th Int. Conf. on Machine Learning, ICML 2020, 13–18 July 2020, Virtual Event*,*Proceedings of Machine Learning Research*, vol. 119, pp. 3154–3164. PMLR. Google Scholar - 19.
Lee K, Carlberg KT . 2021 Deep conservation: a latent-dynamics model for exact satisfaction of physical conservation laws. In*Proc. of the 35th AAAI Conf. on Artificial Intelligence, AAAI 2021, 33rd Conf. on Innovative Applications of Artificial Intelligence, IAAI 2021, 11th Symp. on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, 2–9 February 2021*, pp. 277–285. AAAI Press. Google Scholar - 20.
Pontryagin LS . 2018**Mathematical theory of optimal processes**. London, UK: Routledge. Crossref, Google Scholar - 21.
Yildiz C, Heinonen M, Lähdesmäki H . 2019 ODE2VAE: deep generative second order ODEs with Bayesian neural networks. In*Proc. Advances in Neural Information Processing Systems 32: Annu. Conf. on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, British Columbia, Canada, 8–14 December 2019*(eds HM Wallach, H Larochelle, A Beygelzimer, F d’Alché-Buc, EB Fox, R Garnett), pp. 13 412–13 421. Neural Information Processing Systems Foundation. Google Scholar - 22.
Maulik R, Lusch B, Balaprakash P . 2021 Reduced-order modeling of advection-dominated systems with recurrent neural networks and convolutional autoencoders.**Phys. Fluids**, 037106. (doi:10.1063/5.0039986) Crossref, Web of Science, Google Scholar**33** - 23.
Kolda TG, Bader BW . 2009 Tensor decompositions and applications.**SIAM Rev.**, 455-500. (doi:10.1137/07070111X) Crossref, Web of Science, Google Scholar**51** - 24.
LeCun Y, Bengio Y, Hinton G . 2015 Deep learning.**Nature**, 436-444. (doi:10.1038/nature14539) Crossref, PubMed, Web of Science, Google Scholar**521** - 25.
- 26.
Clevert D, Unterthiner T, Hochreiter S . 2016 Fast and accurate deep network learning by exponential linear units (ELUs). In*Proc. 4th Int. Conf. on Learning Representations, ICLR 2016, San Juan, Puerto Rico, 2–4 May 2016, Conference Track Proceedings*(eds Y Bengio, Y LeCun). La Jolla, CA: International Conference on Representation Learning. Google Scholar - 27.
Dormand JR, Prince PJ . 1980 A family of embedded Runge–Kutta formulae.**J. Comput. Appl. Math.**, 19-26. (doi:10.1016/0771-050X(80)90013-3) Crossref, Google Scholar**6** - 28.
Paszke A *et al.*2019 PyTorch: an imperative style, high-performance deep learning library. In - 29.
Kingma DP, Ba J . 2015 Adam: a method for stochastic optimization. In*Proc. 3rd Int. Conf. on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings*(eds Y Bengio, Y LeCun). La Jolla, CA: International Conference on Representation Learning. Google Scholar - 30.
Rewienski M . 2003 A trajectory piecewise-linear approach to model order reduction of nonlinear dynamical systems. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, USA. Google Scholar - 31.
Lee K, Carlberg KT . 2020 Model reduction of dynamical systems on nonlinear manifolds using deep convolutional autoencoders.**J. Comput. Phys.**, 108973. (doi:10.1016/j.jcp.2019.108973) Crossref, Web of Science, Google Scholar**404** - 32.
Buffoni M, Willcox K . 2010 Projection-based model reduction for reacting flows. In*Proc. 40th Fluid Dynamics Conf. and Exhibit, Chicago, IL, 28 June 2010–1 July 2010*, p. 5008. Reston, VA: American Institute of Aeronautics and Astronautics. Google Scholar - 33.
MacCormack RW . 2014**Numerical computation of compressible and viscous flow**. Reston, VA: American Institute of Aeronautics and Astronautics, Inc. Crossref, Google Scholar - 34.
Choi Y, Carlberg K . 2019 Space–time least-squares Petrov–Galerkin projection for nonlinear model reduction.**SIAM J. Sci. Comput.**, A26-A58. (doi:10.1137/17M1120531) Crossref, Web of Science, Google Scholar**41** - 35.
Roe PL . 1981 Approximate Riemann solvers, parameter vectors, and difference schemes.**J.Comput. Phys.**, 357-372. (doi:10.1016/0021-9991(81)90128-5) Crossref, Web of Science, Google Scholar**43** - 36.
Acemoglu D, Chernozhukov V, Werning I, Whinston MD . 2020 A multi-risk SIR model with optimally targeted lockdown. Technical report. National Bureau of Economic Research, Cambridge, MA, USA. Google Scholar - 37.
Wang H *et al.*2020 Phase-adjusted estimation of the number of coronavirus disease 2019 cases in Wuhan, China.**Cell Discov.**, 1-8. (doi:10.1038/s41421-020-0148-0) Crossref, PubMed, Web of Science, Google Scholar**6** - 38.
Kröger M, Schlickeiser R . 2020 Analytical solution of the SIR-model for the temporal evolution of epidemics. Part A: time-independent reproduction factor.**J. Phys. A: Math. Theor.**, 505601. (doi:10.1088/1751-8121/abc65d) Crossref, Web of Science, Google Scholar**53** - 39.
Schlickeiser R, Kröger M . 2021 Analytical solution of the SIR-model for the temporal evolution of epidemics: part B. Semi-time case.**J. Phys. A: Math. Theor.**, 175601. (doi:10.1088/1751-8121/abed66) Crossref, Google Scholar**54** - 40.
Parish EJ, Carlberg KT . 2020 Time-series machine-learning error models for approximate solutions to parameterized dynamical systems.**Comput. Methods Appl. Mech. Eng.**, 112990. (doi:10.1016/j.cma.2020.112990) Crossref, Web of Science, Google Scholar**365** - 41.
Raissi M, Perdikaris P, Karniadakis GE . 2019 Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.**J. Comput. Phys.**, 686-707. (doi:10.1016/j.jcp.2018.10.045) Crossref, Web of Science, Google Scholar**378** - 42.
Greydanus S, Dzamba M, Yosinski J . 2019 Hamiltonian neural networks. In - 43.
Toth P, Rezende DJ, Jaegle A, Racanière S, Botev A, Higgins I . 2020 Hamiltonian generative networks. In*Proc. 8th Int. Conf. on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020*. La Jolla, CA: International Conference on Representation Learning. Google Scholar - 44.
Chen Z, Zhang J, Arjovsky M, Bottou L . 2020 Symplectic recurrent neural networks. In*Proc. 8th Int. Conf. on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020*. La Jolla, CA: International Conference on Representation Learning. Google Scholar - 45.
Jin P, Zhang Z, Zhu A, Tang Y, Karniadakis GE . 2020 SympNets: intrinsic structure-preserving symplectic networks for identifying Hamiltonian systems.**Neural Netw.**, 166-179. (doi:10.1016/j.neunet.2020.08.017) Crossref, PubMed, Web of Science, Google Scholar**132** - 46.
Cranmer M, Greydanus S, Hoyer S, Battaglia P, Spergel D, Ho S . 2020 Lagrangian neural networks. (http://arxiv.org/abs/2003.04630) Google Scholar - 47.
Hernandez Q, Badias A, González D, Chinesta F, Cueto E . 2021 Structure-preserving neural networks.**J. Comput. Phys.**, 109950. (doi:10.1016/j.jcp.2020.109950) Crossref, Web of Science, Google Scholar**426** - 48.
Holmes P, Lumley JL, Berkooz G, Rowley CW . 2012**Turbulence, coherent structures, dynamical systems and symmetry**. Cambridge, UK: Cambridge University Press. Crossref, Google Scholar - 49.
Hotelling H . 1933 Analysis of a complex of statistical variables into principal components.**J.Educ. Psychol.**, 417-441. (doi:10.1037/h0071325) Crossref, Google Scholar**24** - 50.
Carlberg K, Farhat C, Cortial J, Amsallem D . 2013 The GNAT method for nonlinear model reduction: effective implementation and application to computational fluid dynamics and turbulent flows.**J. Comput. Phys.**, 623-647. (doi:10.1016/j.jcp.2013.02.028) Crossref, Web of Science, Google Scholar**242** - 51.
Carlberg K, Barone M, Antil H . 2017 Galerkin v. least-squares Petrov–Galerkin projection in nonlinear model reduction.**J. Comput. Phys.**, 693-734. (doi:10.1016/j.jcp.2016.10.033) Crossref, Web of Science, Google Scholar**330** - 52.
Benner P, Gugercin S, Willcox K . 2015 A survey of projection-based model reduction methods for parametric dynamical systems.**SIAM Rev.**, 483-531. (doi:10.1137/130932715) Crossref, Web of Science, Google Scholar**57** - 53.
Benner P, Ohlberger M, Cohen A, Willcox K . 2017**Model reduction and approximation: theory and algorithms**. Philadelphia, PA: SIAM. Crossref, Google Scholar - 54.
Ohlberger M, Rave S . 2016 Reduced basis methods: success, limitations and future challenges. In*Proc. of ALGORITMY, Vysoké Tatry - Podbanské, Slovakia, 14–18 March 2016*(eds A Handlovičová, D Ševčovič), pp. 1–12. Bratislava, Slovakia: Publishing House of Slovak University of Technology. Google Scholar - 55.
Kashima K . 2016 Nonlinear model reduction by deep autoencoder of noise response data. In*Proc. 55th IEEE Conf. on Decision and Control, CDC 2016, Las Vegas, NV, USA, 12–14 December 2016*, pp. 5750–5755. New York, NY: IEEE. Google Scholar - 56.
Hartman D, Mestha LK . 2017 A deep learning framework for model reduction of dynamical systems. In*Proc. IEEE Conf. on Control Technology and Applications, CCTA 2017, Mauna Lani Resort, HI, USA, 27–30 August 2017*, pp. 1917–1922. New York, NY: IEEE. Google Scholar - 57.
Kim Y, Choi Y, Widemann D, Zohdi T . 2020 A fast and accurate physics-informed neural network reduced order model with shallow masked autoencoder. (http://arxiv.org/abs/2009.11990) Google Scholar - 58.
Mojgani R . 2020 Reduced order modeling of convection-dominated flows, dimensionality reduction and stabilization. PhD thesis, University of Illinois at Urbana-Champaign, Champaign, IL, USA. Google Scholar - 59.
Pawar S, Rahman S, Vaddireddy H, San O, Rasheed A, Vedula P . 2019 A deep learning enabler for nonintrusive reduced order modeling of fluid flows.**Phys. Fluids**, 085101. (doi:10.1063/1.5113494) Crossref, Web of Science, Google Scholar**31** - 60.
San O, Maulik R, Ahmed M . 2019 An artificial neural network framework for reduced order modeling of transient flows.**Commun. Nonlin. Sci. Numer. Simul.**, 271-287. (doi:10.1016/j.cnsns.2019.04.025) Crossref, Web of Science, Google Scholar**77** - 61.
Xie X, Zhang G, Webster CG . 2019 Non-intrusive inference reduced order model for fluids using deep multistep neural network.**Mathematics**, 757. (doi:10.3390/math7080757) Crossref, Web of Science, Google Scholar**7** - 62.
Geneva N, Zabaras N . 2020 Modeling the dynamics of PDE systems with physics-constrained deep auto-regressive networks.**J. Comput. Phys.**, 109056. (doi:10.1016/j.jcp.2019.109056) Crossref, Web of Science, Google Scholar**403** - 63.
Rim D, Venturi L, Bruna J, Peherstorfer B . 2020 Depth separation for reduced deep networks in nonlinear model reduction: distilling shock waves in nonlinear hyperbolic problems. (http://arxiv.org/abs/2007.13977) Google Scholar - 64.
Erichson NB, Muehlebach M, Mahoney MW . 2019 Physics-informed autoencoders for Lyapunov-stable fluid flow prediction. (http://arxiv.org/abs/1905.10866) Google Scholar - 65.
Hernandez Q, Badias A, Gonzalez D, Chinesta F, Cueto E . 2020 Deep learning of thermodynamics-aware reduced-order models from data. (http://arxiv.org/abs/2007.03758) Google Scholar - 66.
Wang Z, Xiao D, Fang F, Govindan R, Pain CC, Guo Y . 2018 Model identification of reduced order fluid dynamics systems using deep learning.**Int. J. Numer. Methods Fluids**, 255-268. (doi:10.1002/fld.4416) Crossref, Web of Science, Google Scholar**86** - 67.
Rahman SM, Pawar S, San O, Rasheed A, Iliescu T . 2019 Nonintrusive reduced order modeling framework for quasigeostrophic turbulence.**Phys. Rev. E**, 053306. (doi:10.1103/PhysRevE.100.053306) Crossref, PubMed, Web of Science, Google Scholar**100** - 68.
Gonzalez FJ, Balajewicz M . 2018 Deep convolutional recurrent autoencoders for learning low-dimensional feature dynamics of fluid systems. (http://arxiv.org/abs/1808.01346) Google Scholar - 69.
Wiewel S, Becher M, Thürey N . 2019 Latent space physics: towards learning the temporal evolution of fluid flow.**Comput. Graph. Forum**, 71-82. (doi:10.1111/cgf.13620) Crossref, Web of Science, Google Scholar**38** - 70.
Tencer J, Potter K . 2020 Enabling nonlinear manifold projection reduced-order models by extending convolutional neural networks to unstructured data. (http://arxiv.org/abs/2006.06154) Google Scholar - 71.
Otto SE, Rowley CW . 2019 Linearly recurrent autoencoder networks for learning dynamics.**SIAM J. Appl. Dyn. Syst.**, 558-593. (doi:10.1137/18M1177846) Crossref, Web of Science, Google Scholar**18** - 72.
Lusch B, Kutz JN, Brunton SL . 2017 Deep learning for universal linear embeddings of nonlinear dynamics. (http://arxiv.org/abs/1712.09707) Google Scholar - 73.
Takeishi N, Kawahara Y, Yairi T . 2017 Learning Koopman invariant subspaces for dynamic mode decomposition. In*Proc. Advances in Neural Information Processing Systems 30: Annu. Conf. on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017*(eds I Guyon, U von Luxburg, S Bengio, HM Wallach, R Fergus, SVN Vishwanathan, R Garnett), pp.1130–1140. Neural Information Processing Systems Foundation. Google Scholar - 74.
Swischuk R, Mainini L, Peherstorfer B, Willcox K . 2019 Projection-based model reduction: formulations for physics-based machine learning.**Comput. Fluids**, 704-717. (doi:10.1016/j.compfluid.2018.07.021) Crossref, Web of Science, Google Scholar**179** - 75.
Fresca S, Dedè L, Manzoni A . 2021 A comprehensive deep learning-based approach to reduced order modeling of nonlinear time-dependent parametrized PDEs.**J. Sci. Comput.**, 61. (doi:10.1007/s10915-021-01462-7) Crossref, Web of Science, Google Scholar**87** - 76.
Renganathan SA, Maulik R, Rao V . 2020 Machine learning for nonintrusive model order reduction of the parametric inviscid transonic flow past an airfoil.**Phys. Fluids**, 047110. (doi:10.1063/1.5144661) Crossref, Web of Science, Google Scholar**32** - 77.
Kast M, Guo M, Hesthaven JS . 2020 A non-intrusive multifidelity method for the reduced order modeling of nonlinear problems.**Comput. Methods Appl. Mech. Eng.**, 112947. (doi:10.1016/j.cma.2020.112947) Crossref, Web of Science, Google Scholar**364** - 78.
Wang Q, Hesthaven JS, Ray D . 2019 Non-intrusive reduced order modeling of unsteady flows using artificial neural networks with application to a combustion problem.**J. Comput. Phys.**, 289-307. (doi:10.1016/j.jcp.2019.01.031) Crossref, Web of Science, Google Scholar**384** - 79.
Zhuang J, Dvornek NC, Li X, Tatikonda S, Papademetris X, Duncan JS . 2020 Adaptive checkpoint adjoint method for gradient estimation in neural ODE. In*Proc. of the 37th Int. Conf. on Machine Learning, ICML 2020, 13–18 July 2020, Virtual Event*,*Proceedings of Machine Learning Research*vol. 119, pp. 11 639–11 649. PMLR. Google Scholar - 80.
Zhang T, Yao Z, Gholami A, Gonzalez JE, Keutzer K, Mahoney MW, Biros G . 2019 ANODEV2: a coupled neural ODE framework. In*Proc. Advances in Neural Information Processing Systems 32: Annu. Conf. on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, BC, Canada, 8–14 December 2019*(eds HM Wallach, H Larochelle, A Beygelzimer, F d’Alché-Buc, EB Fox, R Garnett), pp. 5152–5162. Neural Information Processing Systems Foundation. Google Scholar - 81.
Quaglino A, Gallieri M, Masci J, Koutnık J . 2020 SNODE: spectral discretization of neural ODEs for system identification. In*Proc. 8th Int. Conf. on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020*. La Jolla, CA: International Conference on Representation Learning. Google Scholar - 82.
Lee K, Parish EJ . 2021Code from: Parameterized neural ordinary differential equations: applications to computational physics problems .**The Royal Society. Dataset**. (https://doi.org/10.6084/m9.figshare.16557722.v1) Google Scholar