Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences
You have accessResearch articles

Parameterized neural ordinary differential equations: applications to computational physics problems

Kookjin Lee

Kookjin Lee

School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, 699 South Mill Avenue, Tempe, AZ85281, USA

Extreme-scale Data Science and Analytics Department, Sandia National Laboratories, 7011 East Avenue, MS 9159, Livermore, CA 94550, USA

[email protected]

Google Scholar

Find this author on PubMed

and
Eric J. Parish

Eric J. Parish

Extreme-scale Data Science and Analytics Department, Sandia National Laboratories, 7011 East Avenue, MS 9159, Livermore, CA 94550, USA

Google Scholar

Find this author on PubMed

Published:https://doi.org/10.1098/rspa.2021.0162

    Abstract

    This work proposes an extension of neural ordinary differential equations (NODEs) by introducing an additional set of ODE input parameters to NODEs. This extension allows NODEs to learn multiple dynamics specified by the input parameter instances. Our extension is inspired by the concept of parameterized ODEs, which are widely investigated in computational science and engineering contexts, where characteristics of the governing equations vary over the input parameters. We apply the proposed parameterized NODEs (PNODEs) for learning latent dynamics of complex dynamical processes that arise in computational physics, which is an essential component for enabling rapid numerical simulations for time-critical physics applications. For this, we propose an encoder–decoder-type framework, which models latent dynamics as PNODEs. We demonstrate the effectiveness of PNODEs on benchmark problems from computational physics.

    1. Introduction

    Numerical simulations of dynamical systems described by systems of ordinary differential equations (ODEs) play essential roles in various engineering and applied science applications. Such systems of ODEs often arise from spatial discretization of time-dependent partial differential equations; examples include predicting input/output responses, design and optimization [1]. These ODEs and their solutions often depend on a set of input parameters, and such ODEs are denoted as parameterized ODEs. Examples of such input parameters within the context of fluid dynamics include Reynolds number and Mach number. In many important scenarios, high-fidelity solutions of parameterized ODEs are required to be computed (i) for many different input parameter instances (i.e. a many-query scenario) or (ii) in real time on a new input parameter instance. A single run of a high-fidelity simulation, however, is often computationally expensive (e.g. when the ODE comprises a semi-discretized partial differential equation with a fine spatio-temporal resolution). Consequently, performing real-time or multiple runs of a high-fidelity simulation can be computationally prohibitive.

    To mitigate this computational burden, many data-driven surrogate modelling approaches have been proposed to replace costly high-fidelity simulations. The common goal of these approaches is to build a latent-dynamics model with lower complexity than that of the high-fidelity model, and to use the reduced model to compute approximate solutions for any new input parameter instance. In general, model-order reduction approaches consist of two components: (i) a low-dimensional latent-dynamics model, where the computational complexity is very low, and (ii) a (non)linear mapping that constructs high-dimensional approximate states (i.e. solutions) from the low-dimensional states obtained from the latent-dynamics model. In many studies, such models are constructed via data-driven techniques in the following steps: (i) collect solutions of high-fidelity simulations for a set of training parameter instances, (ii) build a parameterized surrogate model, and (iii) fit the model by training with the data collected from step (i).

    In the field of deep learning, similar efforts have been made for learning latent dynamics of various physical processes [26]. Neural ordinary differential equations (NODEs), a method of learning time-continuous dynamics in the form of a system of ODEs from data, comprise a particularly promising approach for learning latent dynamics of dynamical systems. NODEs have been studied in [3,712], and this body of work has demonstrated their ability to successfully learn latent dynamics and to be applied to downstream tasks [3,5].

    Because NODEs learn latent dynamics in the form of ODEs, NODEs have a naturally good fit as a latent-dynamics model in data-driven surrogate modelling of physical processes and have been applied to several computational physics problems, including turbulence modelling [13,14] and future state predictions in fluids problems [15]. As pointed out in [16,17], however, NODEs learn a single set of network weights, which fits best for a given training dataset. This results in a NODE model with limited expressibility and often leads to unnecessarily complex dynamics [18]. To overcome this shortcoming, we propose to extend NODEs to have a set of input parameters that specify the dynamics of the NODE model, which leads to parameterized NODEs (PNODEs). With this simple extension, PNODEs can represent multiple trajectories such that the dynamics of each trajectory is characterized by the input parameter instance.

    The main contributions of this paper are (i) an extension to NODEs that enables them to learn multiple trajectories with a single set of network weights; even for the same initial condition, the dynamics can be different for different input parameter instances, (ii) a framework for learning latent dynamics of parameterized ODEs arising in computational physics problems, and (iii) a demonstration of the effectiveness of the proposed framework with advection-dominated benchmark problems, which are a class of problems where classical linear latent-dynamics learning methods (e.g. principal component analysis) often fail to learn accurately [19].

    2. Neural ordinary differential equations

    NODEs are a family of deep neural network models that parameterize the time-continuous dynamics of hidden states using a system of ODEs,

    dz(t)dt=fΘ(z(t),t), 2.1
    where z(t) is a time-continuous representation of a hidden (latent) state, fΘ is a parameterized velocity function that defines the dynamics of hidden states over time and Θ is a set of neural network weights. Given the initial condition z(0) (i.e. input), a hidden state at any time index z(t) can be obtained by solving the initial value problem (IVP) (2.1). To solve the IVP, a black-box differential equation solver can be employed and the hidden states can be computed with the desired accuracy
    z1,,znt=ODESolve(z(0),fΘ,t1,,tnt). 2.2
    In the backward pass, as proposed in [3], gradients are computed by solving another system of ODEs, which are derived using the adjoint sensitivity method [20]; this allows memory-efficient training of the NODE model. As pointed out in [16,17], a NODE model learns a single dynamics for the entire data distribution and, thus, results in a model with limited expressivity.

    3. Parameterized neural ordinary differential equations

    To resolve this, we propose a simple but powerful extension of NODEs. We refer to this extension as PNODEs,

    dz(t;μ)dt=fΘ(z(t;μ),t;μ), 3.1
    with a parameterized initial condition z0(μ), where μ=[μ1,,μnμ]DRnμ denotes problem-specific input parameters. Inspired by the concept of parameterized ODEs, where the ODEs depend on the input parameters, this simple extension allows NODEs to have multiple latent trajectories that depend on the input parameters. This extension only requires minimal modifications in the definition of the velocity function fΘ and can be trained/deployed by using the same mathematical machinery developed for NODEs in the forward pass (e.g. via a black-box ODE solver) and the backward pass (e.g. via the standard backpropagation or via the adjoint-sensitivity method). In practice, fΘ is approximated as a neural network which takes z and μ as an input and then produces dz/dt as an output.

    4. Application to data-driven surrogate modelling

    We now investigate PNODEs within the context of developing data-driven surrogate models of computational physics problems. To this end, we consider systems that can be described by a parameterized system of ODEs, which we denote by the full-order model (FOM),

    dudt=f(u,t;μ),u(0;μ)=u0(μ), 4.1
    where u(t;μ), u:[0,T]×DRN denotes the state, μD denotes the ODE parameters that characterize physical properties (e.g. boundary conditions, forcing terms), DRnμ denotes the parameter space, where nμ is the number of parameters, and T denotes the final time. The initial state is specified by the parameterized initial condition u0(μ), u0:DRN, and f(u(t;μ),t;μ), f:RN×[0,T]×DRN denotes the velocity. We additionally note that, for many analyses, we are often most interested in computing quantities of interest (QoIs) from the system (4.1)
    s:(t,μ)g(u(t,μ)), 4.2
    where s(t,μ)RNs is the observable and g:RNRNs denotes the QoI functional. In this work, we focus on two types of QoIs: (i) the QoI corresponds to the full state, i.e. su, and (ii) QoIs that are scalar valued, i.e. Ns=1.

    For many applications, numerically simulating the system (4.1) and extracting QoIs can be computationally expensive; this is the case, for example, when the state space of the dynamical system is large. For analyses that require an ensemble of trajectories of the forward model (e.g. uncertainty quantification, optimization), directly simulating the system (4.1) can become a computational bottleneck. This motivates the development of low-cost surrogate models. Here, we use PNODE to develop data-driven surrogate models to alleviate this computational burden. To this end, we assume access to a collection of QoI observations obtained via simulation of the system (4.1) at selected training parameter instances, i.e. we assume access to the dataset

    S={s(u(t,μ)),t,μ|tT,μDtrain}, 4.3
    where T={ti}i=1nt with ti=ti1+Δti and Dtrain={μtraini}i=1ntrain are the sets of time and parameter indices from which the samples are obtained, respectively. Additionally, Δti is the ith time interval, which we assume to be uniform with size Δt=Δti=T/nt, for simplicity.

    In what follows, we employ PNODE in conjunction with an encoder–decoder framework to learn surrogate models. In the framework, PNODE is employed to evolve a set of latent variables in time, and a decoder is then employed to map these latent variables to the QoIs. We now outline this framework.

    (a) PNODE encoder–decoder framework

    For t[0,T], μD, we propose to approximate the QoIs as

    s(t,μ)s~(t,μ)hhdec(u^(μ,t);θθdec), 4.4
    where u^:[0,T]×DRp are a set of latent states, hhdec:RpRNs is a mapping from the latent states to the QoIs and θθdec are network weights. We then use PNODE to model the latent dynamics, i.e.
    u^˙=f^Θ(u^,t;μ), 4.5
    where f^Θ:Rp×[0,T]×DRp denotes the velocity of the latent states and Θ are the network weights for PNODE. The goal is to find a set of parameters (Θ,θθdec) that minimizes the error between approximate solutions obtained from (4.4) and the training data (4.3).

    An important aspect of the framework is now emphasized. By introducing a set of latent variables, we have decoupled the dimensionality of observed variables from the dimensionality of the latent dynamics. This enables fine-grained trade-offs between computational cost and model complexity, and is critical in several circumstances. For example, if the dimensionality of the observed variables is large, we can develop a cheap surrogate model by ensuring that the latent dimension is small (i.e. pNs). On the other hand, if the dimensionality of the observed variables is small, as is the case when the observable is scalar-valued, we can still develop a surrogate with enough complexity to capture the underlying dynamics by setting p>Ns.

    (b) Initialization of the latent states

    Simulating the latent dynamics (4.5) requires specification of an initial condition, u^(0,μ). Various techniques exist in the literature to achieve this, and here we employ an encoder framework. We assume access to the initial state for the training parameter instances, S0={u(0,μ)|μtrainDtrain}, and we then employ an encoder for the initial state, i.e.

    u^(0,μ)=hhenc(u(0,μ);θθenc),
    where hhenc(;θθenc):RNRp is the encoder and θθenc are the encoder parameters.

    (c) Learning framework

    With all these components defined, the forward pass of the framework can be described as

    (i)

    encode a reduced initial state from the given initial condition: u^0(μi)=hhenc(u0(μi);θθenc),

    (ii)

    solve a system of ODEs defined by PNODE (or NODE):

    u^i1,,u^int=ODESolve(u^0(μi),f^Θ,μi,t1,,tnt),(PNODE)(or,u^i1,,u^int=ODESolve(u^0(μi),f^Θ,t1,,tnt),(NODE)),

    (iii)

    decode a set of reduced states to a set of approximate observables: s~ik=hhdec(u^ik;θθdec),k=1,,nt, and

    (iv)

    compute sensitivities to loss function L(s~i1,,s~int,si1,,sint) and update weights.

    Here, the subscript i denotes the ith problem-specific parameter instances in the training set, and the final loss is computed as i=1ntrainL(s~i1,,s~int,si1,,sint), where ntrain is the total number of parameter instances in the training set. Figure 1 illustrates the computational graph of the forward pass in the proposed framework. We emphasize that the proposed framework only takes the initial states from the training data and the problem-specific ODE parameters μ as an input. PNODEs can still learn multiple trajectories, which are characterized by the ODE parameters, even if the same initial states are given for different ODE parameters, which is not achievable with NODEs. Furthermore, the proposed framework is significantly simpler than the common neural network settings for NODEs when they are used to learn latent dynamics: the sequence-to-sequence architectures as in [3,5,14,21,22], which require that a (part of a) sequence is fed into the encoder network to produce a context vector, which is then fed into the NODE decoder network as an initial condition.

    Figure 1.

    Figure 1. The forward pass of the proposed framework: (i) the encoder (red arrow, from data space to latent space), which provides a reduced initial state to the PNODE, (ii) solving PNODE (or NODE) with the initial state results in a set of reduced states, and (iii) the decoder (blue arrows, from latent space to data space), which maps the reduced states to approximate observables. For NODE, μ is excluded. (Online version in colour.)

    5. Numerical experiments

    In the following, we apply the proposed framework for learning latent dynamics of parameterized dynamics from computational physics problems. The proposed framework is trained on solutions of parameterized ODEs for a certain set of training input parameter instances, and then is tested to predict either full states or QoIs of parameterized ODEs on an unseen test input parameter instance. We demonstrate the effectiveness of the proposed framework on four benchmark problems: the one-dimensional Burgers equation with forcing, a two-dimensional chemically reacting flow, the quasi-one-dimensional Euler equations and the two-dimensional shallow water equations.

    (a) Data collection

    For training, we collect snapshots of the QoIs by solving the FOM for pre-specified training parameter instances μDtrain{μtraink}k=1ntrainD. This collection results in a tensor

    SRntrain×nt×Ns,
    where nt is the number of time steps. The mode-3 unfolding [23] of this tensor gives
    S[3]=[S(μtrain1)S(μtrainntrain)]RNs×ntrainnt,
    where S(μtraink)RNs×nt consists of the QoI snapshots for μtraink. Additionally, as described in §5b, for initialization of the network we also collect the initial conditions into the snapshot matrix S0=[u(0,μtrain1)u(0,μtrainntrain)]RN×ntrain.

    Assuming that the FOM arises from a spatially discretized partial differential equation, the total degrees of freedom N can be defined as N=nu×n1××nndim, where nu is the number of different types of solution variables (e.g. chemical species), ndim denotes the number of spatial dimensions of the partial differential equation and nk, k=1,,ndim denotes the number of degrees of freedom in each spatial dimension. Note that this spatially distributed data representation is analogous to multi-channel images (i.e. nu corresponds to the number of channels); as such we use (transposed) convolutional layers [24,25] in our encoder and decoder for state prediction. For QoI prediction, we use fully connected layers for the decoder.

    (b) Network architectures, training and testing

    In the experiments, we employ NODE and PNODE for learning latent dynamics and model f^Θ as a feedforward neural network with fully connected layers. For our autoencoder, we employ two different configurations depending on the QoI. For the case where the QoI is the full state (i.e. g=identity), we employ convolutional encoders and transposed/convolutional decoders. The encoder consists of four convolutional layers, followed by one fully connected layer, and the decoder consists of one fully connected layer, followed by four transposed/convolutional layers. To decrease/increase the spatial dimension, we employ strides larger than 1, but we do not use pooling layers. In the case where the QoI is scalar valued, we employ the same encoder, but in this case the decoder is a fully connected network. We employ exponential linear units [26] for all activation functions, with an exception of the output layer of the deocder (i.e. no activation at the output layer). Lastly, for ODESolve, we use the Dormand–Prince method [27], which is provided in the software package of [3]. Our implementation reuses many parts of the software package used in [5], which is written in PyTorch [28]. The details of the configurations will be presented in each of the benchmark problems.

    For training, we set the loss function as the mean squared error (MSE) and optimize the network weights (Θ,θθenc,θθdec) using Adamax, a variant of Adam [29], with an initial learning rate 1e2. At each epoch of training, the loss function is evaluated on the validation set, and the best performing network weights on the validation set are chosen to try the model on the test data. The details on a training/validating/testing split will be given later in the descriptions of each experiment.

    For the performance evaluation metrics, we measure errors of approximated solutions with respect to the reference solutions for testing parameter instances, μtestk. We use the relative 2-norm of the error,

    ||g(UU(μtestk))g(UU~(μtestk))||F||g(UU(μtestk))||F, 5.1
    where UU(μtestk) denotes a collection of the output states and ||||F denotes the Frobenius norm.

    (c) Problem 1: one-dimensional inviscid Burgers’ equation

    The first benchmark problem is a parameterized one-dimensional inviscid Burgers’ equation, which models simplified nonlinear fluid dynamics and demonstrates propagations of shocks. The governing system of partial differential equations is

    w(x,t;μ)t+f(w(x,t;μ))x=0.02eμ2x, 5.2
    where f(w)=0.5w2, x[0,100] and t[0,35]. The boundary condition w(0,t;μ)=μ1 is imposed on the left boundary (x=0) and the initial condition is set by w(x,0;μ)=1. Thus, the problem is characterized by a single variable w (i.e. nu=1) and the two parameters (μ1,μ2) (i.e. nμ=2) which correspond to the Dirichlet boundary condition at x=0 and the forcing term, respectively. Following [30], discretizing equation (5.2) with Godunov’s scheme with 256 control volumes results in a system of parameterized ODEs (FOM) with N=nun1=256. We then solve the FOM using the backward-Euler scheme with a uniform time step Δt=0.07, which results in nt=500 for each μtrainkDtrain. Figure 2 depicts snapshots of reference solutions for parameter instances (μ1,μ2)=(4.25,0.015) at time t={7.77,11.7,19.5,23.3,27.2,35.0}, illustrating the discontinuity (shock) moving from left to right as time proceeds.
    Figure 2.

    Figure 2. (a) t=7.77, (b) t=11.7, (c) t=19.5, (d) t=23.3, (e) t=27.2, (f) t=35.0.

    For the numerical experiments in this subsection, we use the network described in table 1.

    Table 1. Burgers’ equation. Network architecture: kernel filter length κ, number of kernel filters nκ and strides s at each (transposed) convolutional layer.

    encoder decoder
    conv-layer (4 layers) FC-layer (1 layer)
    κ [16, 8, 4, 4] din=p, dout=128
    nκ [8, 16, 32, 64] trans-conv-layer (4 layers)
    s [2, 4, 4, 4] κ [4, 4, 8, 16]
    FC-layer (1 layer) nκ [32, 16, 8, 1]
    din=128, dout=p s [4, 4, 4, 2]

    (i) Reconstruction: approximating a single trajectory with latent-dynamics learning

    In this experiment, we consider a single training/testing parameter instance μtrain1=μtest1=(μ11,μ21)=(4.25,0.015) and test both NODE and PNODE for latent-dynamics modelling. We set the reduced dimension1 as p=5 and the maximum number of epochs as 20 000. The relative errors (equation (5.1)) for NODE and PNODE are 2.6648×103 and 2.6788×103, respectively; the differences between the two errors are negligible.

    (ii) Prediction: approximating an unseen trajectory for unseen parameter instances

    We now consider two multi-parameter scenarios. In the first scenario (scenario 1), we vary the first parameter (boundary condition) and consider four training parameter instances, two validation parameter instances and two test parameter instances. The parameter instances are collected as shown in figure 3a: the training parameter instances correspond to Dtrain={(4.25+0.139k,0.015)},k=0,2,4,6, the validating parameter instances correspond to Dval={(4.25+0.139k,0.015)},k=1,5 and the testing parameter instances correspond to Dtest={(4.67,0.015),(5.22,0.015)}. Note that the initial condition is identical for all parameter instances, i.e. u0(μ)=1.

    Figure 3.

    Figure 3. Burgers’ equation. Visualizations of parameter instances sampling for scenario 1 (a) and scenario 2 (b). (Online version incolour.)

    We train the framework with NODE and PNODE with the same set of hyperparameters with the maximum number of epochs as 50 000. Again, the reduced dimension is set to p=5. Figure 4a,b depicts snapshots of reference solutions and approximated solutions using NODE and PNODE. Both NODE and PNODE learn the boundary condition (i.e. 4.67 at x=0) accurately. For NODE, this is only because the testing boundary condition is linearly in the middle of two validating boundary conditions (and also in the middle of four training boundary conditions) and minimizing the MSE results in learning a single trajectory with NODE, where the trajectory has a boundary condition, which is exactly the middle of two validating boundary conditions, 4.389 and 4.944. Moreover, as NODE learns a single trajectory that minimizes the MSE, it actually fails to learn the correct dynamics and results in poor approximate solutions as time proceeds. As opposed to NODE, PNODE accurately approximates solutions up to the final time. Table 2 (second column) shows the relative 2-errors (equation (5.1)) for both NODE and PNODE; PNODE is seen to yield significantly better predictions than NODE.

    Figure 4.

    Figure 4. (a) NODE, μtest1, (b) PNODE, μtest1, (c) NODE, μtest2, (d) PNODE, μtest2.

    Table 2. Burgers’ equation. Prediction scenario 1: the relative 2-errors (×103).

    μtest1=μ(4) μtest2=μ(8)
    NODE 43.057 157.40
    PNODE 3.6547 5.6900

    Continuing from the previous experiment, we test the second testing parameter instance, Dtest={(5.22,0.015)}, which is located outside Dtrain (i.e. next to μ(7) in figure 3a). The results are shown in figure 4c,d: NODE only learns a single trajectory with the boundary condition, which lies in the middle of validating parameter instances, whereas PNODE accurately produces approximate solutions for the new testing parameter instances. Table 2 (third column) reports the relative errors, where it is again seen that PNODE yields significant improvements over NODE.

    Next, in the second scenario (scenario 2), we vary both parameters μ1 and μ2 as shown in figure 3b: the training, validating and testing parameter instances correspond to

    Dtrain={(4.25+0.139k,0.015+0.002l)},{(k,l)}={(0,0),(0,2),(2,0),(2,2)},Dval={(4.25+0.139k,0.015+0.002l)},{(k,l)}={(1,0),(0,1),(2,1),(1,2)}andDtest={(4.25+0.139k,0.0015+0.002l)},{(k,l)}={(1,1),(3,2),(2,3),(3,3)}.

    We have tested the set of testing parameter instances and table 3 reports the relative errors; the result shows that PNODE achieves less than 1% error in most cases. On the other hand, NODE achieves around 10% errors in most cases. The 1.7% error of NODE for μtest1 is achieved only because the testing parameter instance is located in the middle of the validating parameter instances (and the training parameter instances).

    Table 3. Burgers’ equation. Prediction scenario 2: the relative 2-errors (×103) of the approximate solutions for testing parameter instances.

    μtest1=μ(5) μtest2=μ(10) μtest3=μ(11) μtest4=μ(12)
    NODE 17.422 107.13 89.229 123.77
    PNODE 3.2672 7.7303 8.5650 10.735

    Study on effective latent dimension. We have further tested the framework with NODE and PNODE for varying latent dimensions p={1,2,3,4,5}, with the same hyperparameters as described in table 1, the maximum number of epochs as 50 000 and the same training/validating/testing split as shown in figure 3b. Figure 5 reports the experimental results. For all testing parameter instances, the dimension of latent states marginally affects the performance of NODEs. We believe that this is because NODE learns a dynamics that minimizes the MSE over four validating parameter instances in Dval regardless of the latent dimensions. On the other hand, decreasing the latent dimension to be smaller than 3 (p<3) negatively affects the performances of the PNODEs for all testing parameter instances. Nevertheless, even with the latent dimension 1, p=1, PNODE still outperforms NODE in all testing parameter instances; with p=2, PNODE starts to produce almost an order of magnitude more accurate approximate solutions than NODE does. Moreover, we observe that, for the given training data/training strategy and the hyperparameters, increasing the latent dimension (larger than p=2) only marginally improves the accuracy of the solutions, which agrees with the observations made in [31] and, in some cases, a neural network with larger p (i.e. p>5) requires more epochs to reach the same training error achieved by a neural network with smaller p (i.e. p={3,4,5}).

    Figure 5.

    Figure 5. (a) μtest1=μ(5), (b) μtest2=μ(10), (c) μtest3=μ(11), (d) μtest4=μ(12).

    Study on varying training/validating datasets. We now assess the performance of the proposed framework for different settings of training and validating datasets. This experiment illustrates the dependence of the framework on the amount of training data as well as settings of training/validation/testing splits. To this end, we have trained and tested the framework with PNODE on three sets of input parameter instance samplings, as shown in figure 6, where the first set in figure 6a is equivalent to the set in figure 3b. While all three sets share the testing parameter instances, sets 2 and 3 (figure 6b,c) are built incrementally upon set 1 by having additional training and validating parameter instances. Compared with set 1, set 2 has two more training parameter instances {μ(13),μ(15)} and one more validating parameter instance μ(14), as shown in figure 6b, and set 3 has four more training parameter instances {μ(13),μ(15),μ(16),μ(18)} and two more validating parameter instances {μ(14),μ(17)}, as shown in figure 6c.

    Figure 6.

    Figure 6. Burgers’ equation. Visualizations of parameter instance sampling for a varying number of parameter instances. (a) Set 1, (b) set 2 and (c) set 3. (Online version in colour.)

    We again train the framework with PNODE on the training sets depicted in figure 6. We employ the same hyperparameters as described in table 1, and the maximum number of epochs is set at 50 000. Table 4 reports the accuracy of approximate solutions computed on the testing parameter instances. Increasing the number of training/validating parameter instances has virtually no effect on the accuracy of the approximation measured on the testing parameter instance μ(5). That is, for the given network architecture (table 1), increasing the amount of training/validating data does not significantly affect the accuracy of the approximation for the testing parameter instance μ(5). On the other hand, increasing the number of training/validating parameter instances in the way shown in figure 6 has significantly improved the accuracy of the approximations for the other testing parameter instances {μ(10),μ(11),μ(12)}. This set of experiments essentially illustrates that more accurate approximation can be achieved for testing parameter instances that lie in between training/validating parameter instances (i.e. testing parameter instances within the dotted boxes in figure 6) than for those that lie outside of training/validating parameter instances (i.e. testing parameter instances outside the dotted boxes in figure 6).

    Table 4. Burgers’ equation. Prediction scenario 2: the relative 2-errors (×103) of the approximate solutions for testing parameter instances computed using the framework with PNODE. Each framework with PNODE is trained on different training/validating sets, as depicted in figure 6.

    μtest1=μ(5) μtest2=μ(10) μtest3=μ(11) μtest4=μ(12)
    PNODE—set 1 3.2672 7.7303 8.5650 10.735
    PNODE—set 2 3.3368 6.6086 3.7136 6.5199
    PNODE—set 3 3.3348 3.4409 3.1278 3.1217

    (d) Problem 2: two-dimensional chemically reacting flow

    The next example considers the reaction of a premixed H2–air flame at constant uniform pressure [32], which is described by the set of partial differential equations

    w(x,t;μ)t=(νw(x,t;μ))vw(x,t;μ)+q(w(x,t;μ);μ), 5.3
    where, on the right-hand side, the first term is the diffusion term with the spatial gradient operator , the molecular diffusivity ν=2cm2s1, the second term is the convective term with the constant and divergence-free velocity field v=[50cms1,0]T and the third term is the reactive term with the reaction source term q. The solution w corresponds to the thermo-chemical composition vector defined as w(x,t;μ)=[wT(x,t;μ),wH2(x,t;μ),wO2(x,t;μ),wH2O(x,t;μ)]TR4, where wT denotes the temperature and wH2,wO2andwH2O denote the mass fraction of the chemical species H2, O2 and H2O, respectively. The reaction source term is of Arrhenius type, which is defined as: q(w;μ)=[qT(w;μ),qH2(w;μ),qO2(w;μ),qH2O(w;μ)]T, where
    qT(w;μ)=QqH2O(w;μ)andqi(w;μ)=vi(Wiρ)(ρwH2WH2)vH2(ρwO2WO2)vO2Ae((E/RwT))
    for i{H2,O2,H2O}. Here, (vH2,vO2,vH2O)=(2,1,2) denote stoichiometric coefficients, (WH2,WO2,WH2O)=(2.016,31.9,18) denote molecular weights in units gmol1, ρ=1.39×103g cm3 denotes the density mixture, R=8.314 J mol1K1 denotes the universal gas constant and Q=9800 K denotes the heat of the reaction. The problem has two input parameters (i.e. nμ=2), which correspond to μ=(μ1,μ2)=(A,E), where A and E denote the pre-exponential factor and the activation energy, respectively.

    Figure 7 depicts the geometry of the spatial domain and the boundary conditions are set as:

    Γ2: the inflow boundary with Dirichlet boundary conditions wT=950K and (wH2,wO2,wH2O)=(0.0282,0.2259,0),

    Γ1 and Γ3: the Dirichlet boundary conditions wT=300K and (wH2,wO2,wH2O)=(0,0,0),

    Γ4,Γ5 and Γ6: the homogeneous Neumann condition,

    Figure 7.

    Figure 7. The geometry of the spatial domain for chemically reacting flow.

    and the initial condition is set as wT=300K and (wH2,wO2,wH2O)=(0,0,0) (i.e. empty of chemical species). To collect data, we employ a finite-difference method with 64×32 uniform grid points (i.e. N=nμ×n1×n2=4×64×32), the second-order backward Euler method with a uniform time step Δt=104 and the final time T=0.06 (i.e. nt=600). Figure 8 depicts snapshots of the reference solutions of each species for the training parameter instance (μ1,μ2)=(2.3375×1012,5.6255×103).

    Figure 8.

    Figure 8. Chemically reacting flow. Snapshots of reference solutions of temperature (first row), H2 (second row), O2 (third row) and H2O (fourth row) for the parameter instance μμ=(2.338×1012,5.626×103) at t={0,0.01,0.02,0.03,0.04,0.05,0.06} (from left to right). (Online version in colour.)

    (i) Data preprocessing and training

    Each species in this problem has different numeric scales: the magnitude of wT is about four orders of magnitude larger than those of other species (figure 8). To set the values of each species in a common range [0,1], we employ 0–1 scaling to each species separately. Moreover, because the values of A and E are several orders of magnitude larger than those of the species, we scale the input parameters as well to match the scales of the chemical species. We simply divide the values of the first parameter and the second parameter by 1013 and 104, respectively.2

    2We have not investigated different scaling strategies for scaling of parameter instances as it is not the focus of this study.

    (ii) Prediction: approximating an unseen trajectory for unseen parameter instances

    In this experiment, we vary two parameters: the pre-exponential factor μ1=A and the activation energy μ2=E. We consider parameter instances as depicted in figure 9: the sets of the training, validating and testing parameter instances correspond to

    Dtrain={(2.338+.595k)×1012,(5.626+.482l)×103)},{(k,l)}={(0,0),(0,2),(0,4),(2,0),(2,2),(2,4)},Dval={(2.338+.595k)×1012,(5.626+.482l)×103)},{(k,l)}={(0,1),(1,0),(1,4),(2,3)}andDtest={(2.338+.595k)×1012,(5.626+.482l)×103)},{(k,l)}={(0,3),(1,1),(1,2),(1,3),(2,1),(3,0),(3,1),(4,0)}.
    Figure 9.

    Figure 9. Chemically reacting flow. Visualizations of parameter instance sampling. (Online version in colour.)

    Given the training and validating parameter instances, we train the framework with the hyper-parameters specified in table 5, where the input data consist of two-dimensional data with four channels and the reduced dimension is again set as p=5.

    Table 5. Chemically reacting flow. Network architecture: kernel filter length κ=κ1=κ2, number of kernel filters nκ and strides s=s1=s2 at each (transposed) convolutional layer.

    encoder decoder
    conv-layer (4 layers) FC-layer (1 layer)
    κ [16, 8, 4, 4] din=p, dout=512
    nκ [8, 16, 32, 64] trans-conv-layer (4 layers)
    s [2, 2, 2, 2] κ [4, 4, 8, 16]
    FC-layer (1 layer) nκ [32, 16, 8, 4]
    din=512, dout=p s [2, 2, 2, 2]

    Table 6 presents the relative 2-errors of approximate solutions computed using NODE and PNODE for testing parameter instances in the predictive scenario. The first three rows in table 6 correspond to the results of testing parameter instances at the middle three red circles in figure 9. As expected, both NODE and PNODE work well for these testing parameter instances: NODE is expected to work well for these testing parameter instances because the single trajectory that minimizes the MSE over validating parameter instances would be the trajectory associated with the testing parameter μ(8). As we consider testing parameter instances that are distant from μ(8), we observe PNODE to be (significantly) more accurate than NODE. From these observations, the NODE model can be considered as being overfitted to a trajectory that minimizes the MSE. This overfitting can be avoided to a certain extent by applying, for example, early stopping; however, this cannot fundamentally fix the problem of NODE (i.e. fitting a single trajectory for all input data distributions).

    Table 6. Chemically reacting flow. Prediction scenario: the relative 2-errors (×103).

    μtest1=μ(7) μtest2=μ(8) μtest3=μ(9) μtest4=μ(4)
    NODE 9.2823 3.3450 4.1516 40.835
    PNODE 4.2993 4.6429 5.0617 5.6011
    μtest5=μ(12) μtest6=μ(16) μtest7=μ(17) μtest8=μ(18)
    NODE 34.767 59.410 54.553 74.881
    PNODE 4.4133 12.935 11.785 24.660

    (iii) Prediction: approximating a trajectory of a quantity of interest for unseen parameter instances

    In this experiment, we design the decoder to produce a scalar-valued QoI at each time instance, instead of reconstructing the entire states, i.e. hhdec:RpR1. We consider two QoIs: the temperature and the mass fraction of H2O in the middle of the spatial domain, xQoI=(9mm,4.5mm). We again employ the same setting of the parameter instances as depicted in figure 9. For the network architecture, we consider the same encoder architecture as shown in table 5, which takes an initial state as an input and produces an initial latent state, and, as a decoder, we consider a dense neural network with four fully connected layers with {64,32,16,1} neurons in each layer. The reduced dimension is set as p=5. For each QoI, temperature and the mass fraction of H2O, we train two separate frameworks, whose outputs attempt to match each QoI.

    Figure 10 reports the reference trajectories (solid red lines) of two QoIs: temperature and the mass fraction of H2O evaluated at the middle of the spatial domain over time for four unseen parameter instances {μ(4),μ(8),μ(12),μ(18)} (see figure 9 for a graphical illustration of this choice). The approximated QoIs generated by PNODE and NODE are depicted by dashed blue lines and green dash-dotted lines, respectively. As observed in previous experiments, NODE only learns a single trajectory, which minimizes the MSE loss given the training and validating data; the learned trajectory almost matches with the trajectory associated with μ(8) (i.e. all four green lines match exactly). This is because μ(8) is located in the middle of the training and the validating parameter instances. On the other hand, PNODE produces accurate approximations to both QoIs for multiple test parameter instances. Table 7 reports the relative 2-errors of the approximated QoIs computed using NODE and PNODE and essentially provides the same observations as in figure 10.

    Figure 10.

    Figure 10. Chemically reacting flow. Prediction of two QoIs: trajectories of (a) temperature and (b) the mass fraction of H2O over time for four test parameter instances. (Online version in colour.)

    Table 7. Chemically reacting flow. Prediction of two QoIs: the relative 2-errors (×103) measured for four test parameter instances.

    μ(4) μ(8) μ(12) μ(18)
    temperature NODE 38.930 2.4626 31.943 68.314
    PNODE 2.8002 1.9484 2.4375 22.152
    H2O NODE 74.765 4.4154 57.159 119.06
    PNODE 5.4625 3.3837 4.9756 41.195

    (e) Problem 3: quasi-one-dimensional Euler equation

    For the third benchmark problem, we consider the quasi-one-dimensional Euler equations for modelling inviscid compressible flow in a one-dimensional converging–diverging nozzle with a continuously varying cross-sectional area [33]. The system of the governing equations is

    wt+1Af(w)x=g(w),
    where
    w=[ρρue],f(w)=[ρuρu2+p(e+p)u]andg(w)=[0pAAx0],
    with p=(γ1)ρϵ, ϵ=(e/ρ)(u2/2) and A=A(x). Here, ρ denotes the density, u denotes the velocity, p denotes the pressure, ϵ denotes the energy per unit mass, e denotes the total energy density, γ denotes the specific heat ratio and A(x) denotes the converging–diverging nozzle cross-sectional area. Following the settings used in [34], we consider a specific heat ratio of γ=1.3 and a specific gas constant of R=355.4m2s2K1. The cross-sectional area A(x) is determined by a cubic spline interpolation over the points (x,A(x))={(0,0.2),(0.25,1.05μ),(0.5,μ),(0.75,1.05μ),(1,0.2)}, where μ determines the width of the middle cross-sectional area. Figure 11 depicts the schematic of such a converging–diverging nozzle determined by A(x), parameterized by the width of the middle cross-sectional area, μ. A perfect gas, which obeys the ideal gas law (i.e. p=ρRT), is assumed. For the computation of the initial condition, we refer readers to appendix A.
    Figure 11.

    Figure 11. The geometry of the spatial domain for the quasi-one-dimensional Euler equation of a converging–diverging nozzle.

    For spatial discretization, we employ a finite-volume scheme with 128 equally spaced control volumes and fully implicit boundary conditions, which leads to N=nun1=3×128=384. At each intercell face, the Roe flux difference vector-splitting method [35] is used to compute the flux. For time discretization, we employ the backward Euler scheme with a uniform time step Δt=103 and the final time 0.6 (i.e. nt=600). Figure 12 depicts the snapshots of reference solutions of Mach number M(x) for the middle cross-sectional area μ=0.15 at t={0.1,0.2,0.3,0.4,0.5,0.6}.

    Figure 12.

    Figure 12. (a) t=0.1, (b) t=0.2, (c) t=0.3, (d) t=0.4, (e) t=0.5, (f) t=0.6.

    The varying parameter of the problem is the width of the middle cross-sectional area, which determines the geometry of the spatial domain and, thus, determines the initial condition as well as the dynamics. Analogously to the previous two problems, we select four training parameter instances, three validating parameter instances and three testing parameter instances (figure 13),

    Dtrain={0.13+0.005k},{k}={0,3,6,9},Dval={0.13+0.005k},{k}={1,4,7}andDtest={0.13+0.005k},{k}={2,5,8,10}.
    Again, we set the reduced dimension as p=5.
    Figure 13.

    Figure 13. Quasi-one-dimensional Euler equation. Visualizations of parameter instance sampling. (Online version in colour.)

    With the neural network architecture specified in table 8, we train the framework either with NODE or with PNODE for learning latent dynamics and test the framework in the predictive scenario (i.e. for unseen testing parameter instances as shown in figure 13). Figure 14 depicts the solution snapshots at t={0.1,0.23,0.46,0.6}. We observe that PNODE yields moderate improvements over NODE, i.e. about a 20% decrease in the relative 2-norm of the error (5.1) for all four testing parameter instances (table 9). The improvements are not as dramatic as the ones shown in the previous two benchmark problems. We believe this is because, in this problem setting, varying the input parameter results in fairly distinct initial conditions, but does not significantly affect variations in dynamics; both the initial condition and the dynamics are parameterized by the same input parameter, i.e. the width of the middle cross-sectional area of the spatial domain.

    Table 8. Quasi-one-dimensional Euler equation. Network architecture: kernel filter length κ, number of kernel filters nκ and strides s at each layer of (transposed) convolutional layers.

    encoder
    decoder
    conv-layer (5 layers) FC-layer (1 layer)
    κ [16, 8, 4, 4, 4] din=p, dout=512
    nκ [16, 32, 64, 64, 128] trans-conv-layer (5 layers)
    s [2,2, 2, 2, 2] κ [4, 4, 4, 8, 16]
    FC-layer (1 layer) nκ [64, 64, 32, 16, 3]
    din=512, dout=p s [2, 2, 2, 2, 2]
    Figure 14.

    Figure 14. (a) t=0.1, (b) t=0.23, (c) t=0.46, (d) t=0.6.

    Table 9. Quasi-one-dimensional Euler equation. Prediction scenario: the relative 2-errors (×103).

    μ(3) μ(6) μ(9) μ(11)
    NODE 5.1190 5.2753 5.9479 7.6278
    PNODE 3.9600 4.1280 4.6102 6.1570

    (f) Problem 4: parameterized shallow water equations

    Lastly, we consider the parameterized shallow water equations with Coriolis forcing, given as

    ht+x(hu)+y(hv)=0,hut+x(hu2+12μ1h2)+y(huv)=μ2vandhvt+x(huv)+y(hv2+12μ1h2)=μ2u,
    on the domain (x,y)Ω=[5,5]×[5,5], t[0,T] with T=10, and where h,u,v:Ω×[0,T]R, with h denoting the height of the water surface, u denoting the x-velocity and v denoting the y-velocity. The parameter μ1 is the ‘gravity’ parameter, and the parameter μ2 controls the magnitude of the Coriolis force. The initial condition is given as h(x,y,0)=1+18e(x1)2(y1)2,u(x,y,0)=v(x,y,0)=0,(x,y)Ω, which corresponds to a Gaussian pulse centred at (1,1). For spatial discretization, we discretize the spatial domain into 64×64 uniform elements and employ a second-order discontinuous Galerkin method with the Rusanov flux scheme. Temporal discretization is obtained via a third-order strong stability-preserving explicit Runge–Kutta scheme with a uniform time step of Δt=5×103.

    We again design the decoder to produce a scalar-valued QoI at each time instance, instead of reconstructing the entire states, i.e. hhdec:RpR1. The QoI is the height of the water surface in the middle of the spatial domain. Table 10 describes details of the network architecture: the encoder takes an initial state as an input and produces an initial latent state, and the decoder maps each computed latent state to QoIs at pre-specified time instances. The reduced dimension is set as p=20. Figure 15 depicts the setting of the parameter instances:

    Dtrain={0.1+0.06k,3+1.2l},{(k,l)}={(0,0),(2,0),(4,0),(5,0),(0,2),(2,2),(4,2),(5,2),(0,4),(2,4),(4,4),(5,4)},Dval={0.1+0.06k,3+1.2l},{(k,l)}={(1,0),(3,0),(0,1),(4,1),(1,2),(0,3),(4,3),(1,4),(3,4)}andDtest={0.1+.06k,3+1.2l},{(k,l)}={(2,1),(3,1),(3,2),(1,3),(2,3),(0,5),(3,5),(5,5)}.

    Table 10. Shallow water equations. Network architecture: kernel filter length κ, number of kernel filters nκ and strides s at each layer of convolutional layers.

    encoder
    decoder
    conv-layer (5 layers) FC-layer (5 layers)
    κ [16, 8, 4, 4, 4] FC 1 din=p, dout=128
    nκ [8, 16, 32, 64, 64] FC 2 din=128, dout=64
    s [2, 2, 2, 2, 2] FC 3 din=64, dout=32
    FC-layer (1 layer) FC 4 din=32, dout=16
    din=1024, dout=p FC 5 din=16, dout=1
    Figure 15.

    Figure 15. Shallow water equations. Visualizations of parameter instance sampling. (Online version in colour.)

    Figure 16 depicts the height profiles over time and table 11 reports the relative 2-errors for eight test parameter instances. Figure 16 and table 11 again confirm that the PNODE outperforms NODE by producing accurate parameter-dependent height profiles (i.e. more than an order of magnitude more accurate as measured by relative 2-errors), whereas NODE again learns a single dynamics and fails to capture varying dynamics of input parameter-dependent height profiles.

    Figure 16.

    Figure 16. (a) μtest1=μ(8), (b) μtest2=μ(9), (c) μtest3=μ(14), (d) μtest4=μ(18), (e) μtest5=μ(19), (f) μtest6=μ(27), (g) μtest7=μ(28), (h) μtest8=μ(29).

    Table 11. Shallow water equations. Prediction scenario: the relative 2-errors (×104).

    μtest1=μ(8) μtest2=μ(9) μtest3=μ(14) μtest4=μ(18)
    NODE 58.244 67.960 68.005 77.938
    PNODE 0.97915 1.2295 1.3399 2.4462
    μtest5=μ(19) μtest6=μ(27) μtest7=μ(28) μtest8=μ(29)
    NODE 59.226 106.70 71.944 86.650
    PNODE 1.0573 3.0140 3.8343 2.2901

    (g) Discussion

    Our general observation is that the benefits of using PNODE are more pronounced when the dynamics is parameterized (as we have observed in the results of the numerical experiments). From this, we expect to see more improvements in the approximation accuracy over NODE when the dynamics varies significantly for different input parameters; for instance, compartmental modelling (e.g. susceptible–infectious–recovered (SIR) and susceptible–exposed–infectious–removed (SEIR) models) of infectious diseases such as the novel coronavirus disease 2019 (COVID-19) [36,37], where the dynamics of transmission is greatly affected by parameters of the model, which are determined by, for example, quarantine policy and social distancing. For the effect of the initial conditions on the solutions of SIR models, we refer readers to [38,39], where the analytical solutions of SIR models are derived. Other potential applications of PNODEs include modelling the errors of surrogate models of dynamical systems [40].

    Our approach shares the same limitation with other data-driven surrogate modelling approaches: it does not guarantee preservation of any important physical properties such as conservation. This is a particularly challenging issue, but there have been recent advances in deep learning approaches for enforcing conservation laws (e.g. enforcing conservation laws in subdomains [19], hyperbolic conservation laws [41], Hamiltonian mechanics [42,43], symplectic structures [44,45], Lagrangian mechanics [46] and metriplectic structures [47]) and we believe that adapting/extending ideas of these approaches potentially mitigates the limitation of data-driven surrogate modelling approaches.

    6. Related work

    (a) Classical data-driven surrogate modelling

    Data-driven surrogate modelling approaches have been extensively used and widely studied in computational science and engineering contexts. One of the most promising approaches is reduced-order modelling (ROM) techniques, which rely heavily on linear methods such as the proper orthogonal decomposition (POD) [48], which is analogous to principal component analysis [49], for constructing the mappings between a high-dimensional space and a low-dimensional space. These ROMs then identify the latent-dynamics model by executing a (linear) projection process on the high-dimensional equations, e.g. Galerkin projection [48] or least-squares Petrov–Galerkin projection [50,51]. We refer readers to [5254] for a complete survey on classical methods.

    (b) Physics-aware deep-learning-based surrogate modelling

    Recent work has extended classical ROMs by replacing POD with nonlinear dimension reduction techniques emerging from deep learning [6,19,31,5558]. These approaches operate by identifying a nonlinear mapping (via, for example, convolutional autoencoders) and subsequently identifying the latent dynamics as certain residual minimization problems [6,19,31,57,58], which are defined on the latent space and are derived from the governing equations. In [55,56], the latent dynamics is identified by simply projecting the governing equation using the encoder, which may lead to kinematically inconsistent dynamics.

    Another class of physics-aware surrogate modelling methods include explicitly modelling time integration schemes [5962], adaptive basis selection [63] and adding stability/structure-preserving constraints in the latent dynamics [64,65]. We emphasize that our approach is closely related to [60], where neural networks are trained to approximate the action of the first-order time-integration scheme applied to latent dynamics and, at each time step, a neural network takes a set of problem-specific parameters as well as the reduced state as an input. Thus, our approach can be seen as a time-continuous generalization of the approach in [60].

    (c) Purely data-driven deep-learning-based surrogate modelling

    Another approach for developing deep-learning-based surrogate models is to learn both nonlinear mappings and latent dynamics in purely data-driven ways. Latent dynamics is modelled as recurrent neural networks with long short-term memory (LSTM) units along with linear POD mappings [14,66,67] or nonlinear mappings constructed via (convolutional) autoencoders [22,6870]. In [13,14], latent dynamics and nonlinear mappings are modelled as NODEs and autoencoders, respectively; in [4,7173], autoencoders are used to learn approximate invariant subspaces of the Koopman operator. Relatedly, there have been studies on learning direct mappings via, for example, a neural network, from parameters (including time parameters) to either latent states or approximate solution states [7478], where the latent states are computed by using autoencoders or linear POD.

    (d) Enhancing NODE

    Augmented NODEs [16] extends NODEs by augmenting additional state variables to hidden state variables, which allows NODEs to learn dynamics using the additional dimensions and, consequently, to have increased expressibility. ANODE [12] discretizes the integration range into a fixed number of steps (i.e. checkpoints) to mitigate numerical instability in the backward pass of NODEs; ACA [79] further extends this approach by adopting an adaptive stepsize solver in the backward pass. ANODEV2 [80] proposes a coupled system of NODEs, where both hidden state variables and network weights are allowed to evolve over time and their dynamics is approximated as neural networks. Neural optimal control [17] formulates NODEs as controlled dynamical systems and infers optimal control via an encoder network. This formulation results in NODEs that adjust the dynamics for different input data. Moreover, improved training strategies for NODEs have been studied in [18] and an extension of using spectral elements in discretizations of NODE has been proposed in [81].

    7. Conclusion

    In this study, we proposed a parameterized extension of NODEs and a novel framework for data-driven surrogate modelling of complex numerical simulations of computational physics problems. Our simple extension allows NODE models to learn multiple complex trajectories. This extension overcomes the main drawback of neural ODEs, namely that only a single set of dynamics is learned for the entire data distribution. We have demonstrated the effectiveness of parameterized NODEs on several benchmark problems from computational fluid dynamics, and have shown that the proposed method outperforms NODEs.

    Data accessibility

    The code has been provided in the electronic supplementary material [82]. Some large files will be available upon request.

    Authors' contributions

    K.L. participated in the design of the idea, performed the numerical experiments and drafted the manuscript. E.J.P. participated in the design of the idea and critically revised the manuscript. Both authors reviewed and gave final approval to the manuscript.

    Competing interests

    We declare we have no competing interests.

    Funding

    E.J.P. was funded under Sandia’s Advanced Simulation and Computing (ASC) Verification and Validation (V&V) Project/task no. 103723/05.30.02.

    Disclaimer

    This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract no. DE-NA0003525.

    Appendix A. Quasi-one-dimensional Euler equation: the initial condition

    For the initial condition, the initial flow field is computed as follows. A zero pressure-gradient flow field is constructed via the isentropic relations

    M(x)=MmAmA(x)(1+((γ1)/2)M(x)21+((γ1)/2)Mm2)((γ+1)/2(γ1)),p(x)=ptotal(1+γ12M(x)2)γ/(γ1),T(x)=Ttotal(1+γ12M(x)2)1,ρ(x)=p(x)RT(x),c(x)=Rp(x)ρ(x),u(x)=M(x)c(x),
    where M denotes the Mach number, c denotes the speed of sound, a subscript m indicates the flow quantity at x=0.5m, Ttotal=300K denotes the total temperature and ptotal=106Nm2 denotes the total pressure. The shock is located at x=0.85 m and the velocity across the shock (u2) is computed by using the jump relations for a stationary shock and the perfect gas equation of state. The velocity across the shock satisfies the quadratic equation
    (12γγ1)u22+γγ1nmu2h=0,
    where m=ρ2u2=ρ1u1,n=ρ2u22+p2=ρ1u12+p1,h=(e2+p2)/ρ2=(e1+p1)/ρ1. The subscripts 1 and 2 indicate quantities to the left and to the right of the shock. We consider a specific Mach number of Mm=2.0.

    Footnotes

    1 For setting p, we follow the results of the study on the effective latent dimension shown in [19].

    Electronic supplementary material is available online at https://doi.org/10.6084/m9.figshare.c.5599853.

    Published by the Royal Society. All rights reserved.

    References

    • 1.
      Quarteroni A, Manzoni A, Negri F. 2015 Reduced basis methods for partial differential equations: an introduction, vol. 92. Berlin, Germany: Springer. Google Scholar
    • 2.
      Karl M, Soelch M, Bayer J, van der Smagt P. 2017 Deep variational Bayes filters: unsupervised learning of state space models from raw data. In Proc. 5th Int. Conf. on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017. La Jolla, CA: International Conference on Representation Learning. Google Scholar
    • 3.
      Chen TQ, Rubanova Y, Bettencourt J, Duvenaud D. 2018 Neural ordinary differential equations. In Proc. Advances in Neural Information Processing Systems 31: Annu. Conf. on Neural Information Processing Systems 2018, NeurIPS 2018, Montréal, Canada, 3–8 December 2018 (eds S Bengio, HM Wallach, H Larochelle, K Grauman, N Cesa-Bianchi, R Garnett), pp. 6572–6583. Neural Information Processing Systems Foundation. Google Scholar
    • 4.
      Morton J, Jameson A, Kochenderfer MJ, Witherden FD. 2018 Deep dynamical modeling and control of unsteady fluid flows. In Proc. Advances in Neural Information Processing Systems 31: Annu. Conf. on Neural Information Processing Systems 2018, NeurIPS 2018, Montréal, Canada, 3–8 December 2018 (eds S Bengio, HM Wallach, H Larochelle, K Grauman, N Cesa-Bianchi, R Garnett), pp. 9278–9288. Neural Information Processing Systems Foundation. Google Scholar
    • 5.
      Rubanova Y, Chen TQ, Duvenaud D. 2019 Latent ordinary differential equations for irregularly-sampled time series. In Proc. Advances in Neural Information Processing Systems 32: Annu. Conf. on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, British Columbia, Canada, 8–14 December 2019 (eds HM Wallach, H Larochelle, A Beygelzimer, Fd’Alché-Buc, EB Fox, R Garnett), pp. 5321–5331. Neural Information Processing Systems Foundation. Google Scholar
    • 6.
      Fulton L, Modi V, Duvenaud D, Levin DIW, Jacobson A. 2019 Latent-space dynamics for reduced deformable simulation. Comput. Graph. Forum 38, 379-391. (doi:10.1111/cgf.13645) Crossref, Web of ScienceGoogle Scholar
    • 7.
      Weinan E. 2017 A proposal on machine learning via dynamical systems. Commun. Math. Stat. 5, 1-11. (doi:10.1007/s40304-017-0103-z) Crossref, Web of ScienceGoogle Scholar
    • 8.
      Haber E, Ruthotto L. 2017 Stable architectures for deep neural networks. Inverse Prob. 34, 014004. (doi:10.1088/1361-6420/aa9a90) Crossref, Web of ScienceGoogle Scholar
    • 9.
      Ruthotto L, Haber E. 2020 Deep neural networks motivated by partial differential equations. J. Math. Imaging Vis. 62, 352-364. (doi:10.1007/s10851-019-00903-1) Crossref, Web of ScienceGoogle Scholar
    • 10.
      Lu Y, Zhong A, Li Q, Dong B. 2018 Beyond finite layer neural networks: bridging deep architectures and numerical differential equations. In Proc. of the 35th Int. Conf. on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018, Proceedings of Machine Learning Research, vol. 80 (eds JG Dy, A Krause), pp. 3282–3291. PMLR. Google Scholar
    • 11.
      Ciccone M, Gallieri M, Masci J, Osendorfer C, Gomez FJ. 2018 NAIS-net: stable deep networks from non-autonomous differential equations. In Proc. Advances in Neural Information Processing Systems 31: Annu. Conf. on Neural Information Processing Systems 2018, NeurIPS 2018, Montréal, Canada, 3–8 December 2018 (eds S Bengio, HM Wallach, H Larochelle, K Grauman, NCesa-Bianchi, R Garnett), pp. 3029–3039. Neural Information Processing Systems Foundation. Google Scholar
    • 12.
      Gholaminejad A, Keutzer K, Biros G. 2019 ANODE: unconditionally accurate memory-efficient gradients for neural ODEs. In Proc. of the 28th Int. Joint Conf. on Artificial Intelligence, IJCAI 2019, Macao, China, 10–16 August 2019 (ed. S Kraus), pp. 730–736. International Joint Conferences on Artificial Intelligence. Google Scholar
    • 13.
      Portwood GD et al. 2019 Turbulence forecasting via neural ODE. (http://arxiv.org/abs/1911.05180) Google Scholar
    • 14.
      Maulik R, Mohan A, Lusch B, Madireddy S, Balaprakash P, Livescu D. 2020 Time-series learning of latent-space dynamics for reduced-order model closure. Physica D 405, 132368. (doi:10.1016/j.physd.2020.132368) Crossref, Web of ScienceGoogle Scholar
    • 15.
      Ayed I, de Bézenac E, Pajot A, Brajard J, Gallinari P. 2019 Learning dynamical systems from partial observations. (http://arxiv.org/abs/1902.11136) Google Scholar
    • 16.
      Dupont E, Doucet A, Teh YW. 2019 Augmented neural ODEs. In Proc. Advances in Neural Information Processing Systems 32: Annu. Conf. on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, British Columbia, Canada, 8–14 December 2019 (eds HM Wallach, H Larochelle, A Beygelzimer, Fd’Alché-Buc, EB Fox, R Garnetts), pp. 3134–3144. Neural Information Processing Systems Foundation. Google Scholar
    • 17.
      Chalvidal M, Ricci M, VanRullen R, Serre T. 2020 Neural optimal control for representation learning. (http://arxiv.org/abs/2006.09545) Google Scholar
    • 18.
      Finlay C, Jacobsen J, Nurbekyan L, Oberman AM. 2020 How to train your neural ODE: the world of Jacobian and kinetic regularization. In Proc. of the 37th Int. Conf. on Machine Learning, ICML 2020, 13–18 July 2020, Virtual Event, Proceedings of Machine Learning Research, vol. 119, pp. 3154–3164. PMLR. Google Scholar
    • 19.
      Lee K, Carlberg KT. 2021 Deep conservation: a latent-dynamics model for exact satisfaction of physical conservation laws. In Proc. of the 35th AAAI Conf. on Artificial Intelligence, AAAI 2021, 33rd Conf. on Innovative Applications of Artificial Intelligence, IAAI 2021, 11th Symp. on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, 2–9 February 2021, pp. 277–285. AAAI Press. Google Scholar
    • 20.
      Pontryagin LS. 2018 Mathematical theory of optimal processes. London, UK: Routledge. CrossrefGoogle Scholar
    • 21.
      Yildiz C, Heinonen M, Lähdesmäki H. 2019 ODE2VAE: deep generative second order ODEs with Bayesian neural networks. In Proc. Advances in Neural Information Processing Systems 32: Annu. Conf. on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, British Columbia, Canada, 8–14 December 2019 (eds HM Wallach, H Larochelle, A Beygelzimer, F d’Alché-Buc, EB Fox, R Garnett), pp. 13 412–13 421. Neural Information Processing Systems Foundation. Google Scholar
    • 22.
      Maulik R, Lusch B, Balaprakash P. 2021 Reduced-order modeling of advection-dominated systems with recurrent neural networks and convolutional autoencoders. Phys. Fluids 33, 037106. (doi:10.1063/5.0039986) Crossref, Web of ScienceGoogle Scholar
    • 23.
      Kolda TG, Bader BW. 2009 Tensor decompositions and applications. SIAM Rev. 51, 455-500. (doi:10.1137/07070111X) Crossref, Web of ScienceGoogle Scholar
    • 24.
      LeCun Y, Bengio Y, Hinton G. 2015 Deep learning. Nature 521, 436-444. (doi:10.1038/nature14539) Crossref, PubMed, Web of ScienceGoogle Scholar
    • 25.
      Goodfellow I, Bengio Y, Courville A. 2016 Deep learning. Cambridge, MA: MIT Press. Google Scholar
    • 26.
      Clevert D, Unterthiner T, Hochreiter S. 2016 Fast and accurate deep network learning by exponential linear units (ELUs). In Proc. 4th Int. Conf. on Learning Representations, ICLR 2016, San Juan, Puerto Rico, 2–4 May 2016, Conference Track Proceedings (eds Y Bengio, Y LeCun). La Jolla, CA: International Conference on Representation Learning. Google Scholar
    • 27.
      Dormand JR, Prince PJ. 1980 A family of embedded Runge–Kutta formulae. J. Comput. Appl. Math. 6, 19-26. (doi:10.1016/0771-050X(80)90013-3) CrossrefGoogle Scholar
    • 28.
      Paszke A et al. 2019 PyTorch: an imperative style, high-performance deep learning library. In Proc. Advances in Neural Information Processing Systems 32: Annu. Conf. on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, British Columbia, Canada, 8–14 December 2019 (eds HM Wallach, H Larochelle, A Beygelzimer, F d’Alché-Buc, EB Fox, R Garnett), pp. 8024–8035. Neural Information Processing Systems Foundation. Google Scholar
    • 29.
      Kingma DP, Ba J. 2015 Adam: a method for stochastic optimization. In Proc. 3rd Int. Conf. on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (eds Y Bengio, Y LeCun). La Jolla, CA: International Conference on Representation Learning. Google Scholar
    • 30.
      Rewienski M. 2003 A trajectory piecewise-linear approach to model order reduction of nonlinear dynamical systems. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, USA. Google Scholar
    • 31.
      Lee K, Carlberg KT. 2020 Model reduction of dynamical systems on nonlinear manifolds using deep convolutional autoencoders. J. Comput. Phys. 404, 108973. (doi:10.1016/j.jcp.2019.108973) Crossref, Web of ScienceGoogle Scholar
    • 32.
      Buffoni M, Willcox K. 2010 Projection-based model reduction for reacting flows. In Proc. 40th Fluid Dynamics Conf. and Exhibit, Chicago, IL, 28 June 2010–1 July 2010, p. 5008. Reston, VA: American Institute of Aeronautics and Astronautics. Google Scholar
    • 33.
      MacCormack RW. 2014 Numerical computation of compressible and viscous flow. Reston, VA: American Institute of Aeronautics and Astronautics, Inc. CrossrefGoogle Scholar
    • 34.
      Choi Y, Carlberg K. 2019 Space–time least-squares Petrov–Galerkin projection for nonlinear model reduction. SIAM J. Sci. Comput. 41, A26-A58. (doi:10.1137/17M1120531) Crossref, Web of ScienceGoogle Scholar
    • 35.
      Roe PL. 1981 Approximate Riemann solvers, parameter vectors, and difference schemes. J.Comput. Phys. 43, 357-372. (doi:10.1016/0021-9991(81)90128-5) Crossref, Web of ScienceGoogle Scholar
    • 36.
      Acemoglu D, Chernozhukov V, Werning I, Whinston MD. 2020 A multi-risk SIR model with optimally targeted lockdown. Technical report. National Bureau of Economic Research, Cambridge, MA, USA. Google Scholar
    • 37.
      Wang H et al. 2020 Phase-adjusted estimation of the number of coronavirus disease 2019 cases in Wuhan, China. Cell Discov. 6, 1-8. (doi:10.1038/s41421-020-0148-0) Crossref, PubMed, Web of ScienceGoogle Scholar
    • 38.
      Kröger M, Schlickeiser R. 2020 Analytical solution of the SIR-model for the temporal evolution of epidemics. Part A: time-independent reproduction factor. J. Phys. A: Math. Theor. 53, 505601. (doi:10.1088/1751-8121/abc65d) Crossref, Web of ScienceGoogle Scholar
    • 39.
      Schlickeiser R, Kröger M. 2021 Analytical solution of the SIR-model for the temporal evolution of epidemics: part B. Semi-time case. J. Phys. A: Math. Theor. 54, 175601. (doi:10.1088/1751-8121/abed66) CrossrefGoogle Scholar
    • 40.
      Parish EJ, Carlberg KT. 2020 Time-series machine-learning error models for approximate solutions to parameterized dynamical systems. Comput. Methods Appl. Mech. Eng. 365, 112990. (doi:10.1016/j.cma.2020.112990) Crossref, Web of ScienceGoogle Scholar
    • 41.
      Raissi M, Perdikaris P, Karniadakis GE. 2019 Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686-707. (doi:10.1016/j.jcp.2018.10.045) Crossref, Web of ScienceGoogle Scholar
    • 42.
      Greydanus S, Dzamba M, Yosinski J. 2019 Hamiltonian neural networks. In Proc. Advances in Neural Information Processing Systems 32: Annu. Conf. on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, British Columbia, Canada, 8–14 December 2019 (eds HM Wallach, HLarochelle, A Beygelzimer, F d’Alché-Buc, EB Fox, R Garnett), pp. 15 353–15 363. Neural Information Processing Systems Foundation. Google Scholar
    • 43.
      Toth P, Rezende DJ, Jaegle A, Racanière S, Botev A, Higgins I. 2020 Hamiltonian generative networks. In Proc. 8th Int. Conf. on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. La Jolla, CA: International Conference on Representation Learning. Google Scholar
    • 44.
      Chen Z, Zhang J, Arjovsky M, Bottou L. 2020 Symplectic recurrent neural networks. In Proc. 8th Int. Conf. on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. La Jolla, CA: International Conference on Representation Learning. Google Scholar
    • 45.
      Jin P, Zhang Z, Zhu A, Tang Y, Karniadakis GE. 2020 SympNets: intrinsic structure-preserving symplectic networks for identifying Hamiltonian systems. Neural Netw. 132, 166-179. (doi:10.1016/j.neunet.2020.08.017) Crossref, PubMed, Web of ScienceGoogle Scholar
    • 46.
      Cranmer M, Greydanus S, Hoyer S, Battaglia P, Spergel D, Ho S. 2020 Lagrangian neural networks. (http://arxiv.org/abs/2003.04630) Google Scholar
    • 47.
      Hernandez Q, Badias A, González D, Chinesta F, Cueto E. 2021 Structure-preserving neural networks. J. Comput. Phys. 426, 109950. (doi:10.1016/j.jcp.2020.109950) Crossref, Web of ScienceGoogle Scholar
    • 48.
      Holmes P, Lumley JL, Berkooz G, Rowley CW. 2012 Turbulence, coherent structures, dynamical systems and symmetry. Cambridge, UK: Cambridge University Press. CrossrefGoogle Scholar
    • 49.
      Hotelling H. 1933 Analysis of a complex of statistical variables into principal components. J.Educ. Psychol. 24, 417-441. (doi:10.1037/h0071325) CrossrefGoogle Scholar
    • 50.
      Carlberg K, Farhat C, Cortial J, Amsallem D. 2013 The GNAT method for nonlinear model reduction: effective implementation and application to computational fluid dynamics and turbulent flows. J. Comput. Phys. 242, 623-647. (doi:10.1016/j.jcp.2013.02.028) Crossref, Web of ScienceGoogle Scholar
    • 51.
      Carlberg K, Barone M, Antil H. 2017 Galerkin v. least-squares Petrov–Galerkin projection in nonlinear model reduction. J. Comput. Phys. 330, 693-734. (doi:10.1016/j.jcp.2016.10.033) Crossref, Web of ScienceGoogle Scholar
    • 52.
      Benner P, Gugercin S, Willcox K. 2015 A survey of projection-based model reduction methods for parametric dynamical systems. SIAM Rev. 57, 483-531. (doi:10.1137/130932715) Crossref, Web of ScienceGoogle Scholar
    • 53.
      Benner P, Ohlberger M, Cohen A, Willcox K. 2017 Model reduction and approximation: theory and algorithms. Philadelphia, PA: SIAM. CrossrefGoogle Scholar
    • 54.
      Ohlberger M, Rave S. 2016 Reduced basis methods: success, limitations and future challenges. In Proc. of ALGORITMY, Vysoké Tatry - Podbanské, Slovakia, 14–18 March 2016 (eds A Handlovičová, D Ševčovič), pp. 1–12. Bratislava, Slovakia: Publishing House of Slovak University of Technology. Google Scholar
    • 55.
      Kashima K. 2016 Nonlinear model reduction by deep autoencoder of noise response data. In Proc. 55th IEEE Conf. on Decision and Control, CDC 2016, Las Vegas, NV, USA, 12–14 December 2016, pp. 5750–5755. New York, NY: IEEE. Google Scholar
    • 56.
      Hartman D, Mestha LK. 2017 A deep learning framework for model reduction of dynamical systems. In Proc. IEEE Conf. on Control Technology and Applications, CCTA 2017, Mauna Lani Resort, HI, USA, 27–30 August 2017, pp. 1917–1922. New York, NY: IEEE. Google Scholar
    • 57.
      Kim Y, Choi Y, Widemann D, Zohdi T. 2020 A fast and accurate physics-informed neural network reduced order model with shallow masked autoencoder. (http://arxiv.org/abs/2009.11990) Google Scholar
    • 58.
      Mojgani R. 2020 Reduced order modeling of convection-dominated flows, dimensionality reduction and stabilization. PhD thesis, University of Illinois at Urbana-Champaign, Champaign, IL, USA. Google Scholar
    • 59.
      Pawar S, Rahman S, Vaddireddy H, San O, Rasheed A, Vedula P. 2019 A deep learning enabler for nonintrusive reduced order modeling of fluid flows. Phys. Fluids 31, 085101. (doi:10.1063/1.5113494) Crossref, Web of ScienceGoogle Scholar
    • 60.
      San O, Maulik R, Ahmed M. 2019 An artificial neural network framework for reduced order modeling of transient flows. Commun. Nonlin. Sci. Numer. Simul. 77, 271-287. (doi:10.1016/j.cnsns.2019.04.025) Crossref, Web of ScienceGoogle Scholar
    • 61.
      Xie X, Zhang G, Webster CG. 2019 Non-intrusive inference reduced order model for fluids using deep multistep neural network. Mathematics 7, 757. (doi:10.3390/math7080757) Crossref, Web of ScienceGoogle Scholar
    • 62.
      Geneva N, Zabaras N. 2020 Modeling the dynamics of PDE systems with physics-constrained deep auto-regressive networks. J. Comput. Phys. 403, 109056. (doi:10.1016/j.jcp.2019.109056) Crossref, Web of ScienceGoogle Scholar
    • 63.
      Rim D, Venturi L, Bruna J, Peherstorfer B. 2020 Depth separation for reduced deep networks in nonlinear model reduction: distilling shock waves in nonlinear hyperbolic problems. (http://arxiv.org/abs/2007.13977) Google Scholar
    • 64.
      Erichson NB, Muehlebach M, Mahoney MW. 2019 Physics-informed autoencoders for Lyapunov-stable fluid flow prediction. (http://arxiv.org/abs/1905.10866) Google Scholar
    • 65.
      Hernandez Q, Badias A, Gonzalez D, Chinesta F, Cueto E. 2020 Deep learning of thermodynamics-aware reduced-order models from data. (http://arxiv.org/abs/2007.03758) Google Scholar
    • 66.
      Wang Z, Xiao D, Fang F, Govindan R, Pain CC, Guo Y. 2018 Model identification of reduced order fluid dynamics systems using deep learning. Int. J. Numer. Methods Fluids 86, 255-268. (doi:10.1002/fld.4416) Crossref, Web of ScienceGoogle Scholar
    • 67.
      Rahman SM, Pawar S, San O, Rasheed A, Iliescu T. 2019 Nonintrusive reduced order modeling framework for quasigeostrophic turbulence. Phys. Rev. E 100, 053306. (doi:10.1103/PhysRevE.100.053306) Crossref, PubMed, Web of ScienceGoogle Scholar
    • 68.
      Gonzalez FJ, Balajewicz M. 2018 Deep convolutional recurrent autoencoders for learning low-dimensional feature dynamics of fluid systems. (http://arxiv.org/abs/1808.01346) Google Scholar
    • 69.
      Wiewel S, Becher M, Thürey N. 2019 Latent space physics: towards learning the temporal evolution of fluid flow. Comput. Graph. Forum 38, 71-82. (doi:10.1111/cgf.13620) Crossref, Web of ScienceGoogle Scholar
    • 70.
      Tencer J, Potter K. 2020 Enabling nonlinear manifold projection reduced-order models by extending convolutional neural networks to unstructured data. (http://arxiv.org/abs/2006.06154) Google Scholar
    • 71.
      Otto SE, Rowley CW. 2019 Linearly recurrent autoencoder networks for learning dynamics. SIAM J. Appl. Dyn. Syst. 18, 558-593. (doi:10.1137/18M1177846) Crossref, Web of ScienceGoogle Scholar
    • 72.
      Lusch B, Kutz JN, Brunton SL. 2017 Deep learning for universal linear embeddings of nonlinear dynamics. (http://arxiv.org/abs/1712.09707) Google Scholar
    • 73.
      Takeishi N, Kawahara Y, Yairi T. 2017 Learning Koopman invariant subspaces for dynamic mode decomposition. In Proc. Advances in Neural Information Processing Systems 30: Annu. Conf. on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017 (eds I Guyon, U von Luxburg, S Bengio, HM Wallach, R Fergus, SVN Vishwanathan, R Garnett), pp.1130–1140. Neural Information Processing Systems Foundation. Google Scholar
    • 74.
      Swischuk R, Mainini L, Peherstorfer B, Willcox K. 2019 Projection-based model reduction: formulations for physics-based machine learning. Comput. Fluids 179, 704-717. (doi:10.1016/j.compfluid.2018.07.021) Crossref, Web of ScienceGoogle Scholar
    • 75.
      Fresca S, Dedè L, Manzoni A. 2021 A comprehensive deep learning-based approach to reduced order modeling of nonlinear time-dependent parametrized PDEs. J. Sci. Comput. 87, 61. (doi:10.1007/s10915-021-01462-7) Crossref, Web of ScienceGoogle Scholar
    • 76.
      Renganathan SA, Maulik R, Rao V. 2020 Machine learning for nonintrusive model order reduction of the parametric inviscid transonic flow past an airfoil. Phys. Fluids 32, 047110. (doi:10.1063/1.5144661) Crossref, Web of ScienceGoogle Scholar
    • 77.
      Kast M, Guo M, Hesthaven JS. 2020 A non-intrusive multifidelity method for the reduced order modeling of nonlinear problems. Comput. Methods Appl. Mech. Eng. 364, 112947. (doi:10.1016/j.cma.2020.112947) Crossref, Web of ScienceGoogle Scholar
    • 78.
      Wang Q, Hesthaven JS, Ray D. 2019 Non-intrusive reduced order modeling of unsteady flows using artificial neural networks with application to a combustion problem. J. Comput. Phys. 384, 289-307. (doi:10.1016/j.jcp.2019.01.031) Crossref, Web of ScienceGoogle Scholar
    • 79.
      Zhuang J, Dvornek NC, Li X, Tatikonda S, Papademetris X, Duncan JS. 2020 Adaptive checkpoint adjoint method for gradient estimation in neural ODE. In Proc. of the 37th Int. Conf. on Machine Learning, ICML 2020, 13–18 July 2020, Virtual Event, Proceedings of Machine Learning Research vol. 119, pp. 11 639–11 649. PMLR. Google Scholar
    • 80.
      Zhang T, Yao Z, Gholami A, Gonzalez JE, Keutzer K, Mahoney MW, Biros G. 2019 ANODEV2: a coupled neural ODE framework. In Proc. Advances in Neural Information Processing Systems 32: Annu. Conf. on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, BC, Canada, 8–14 December 2019 (eds HM Wallach, H Larochelle, A Beygelzimer, F d’Alché-Buc, EB Fox, R Garnett), pp. 5152–5162. Neural Information Processing Systems Foundation. Google Scholar
    • 81.
      Quaglino A, Gallieri M, Masci J, Koutnık J. 2020 SNODE: spectral discretization of neural ODEs for system identification. In Proc. 8th Int. Conf. on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. La Jolla, CA: International Conference on Representation Learning. Google Scholar
    • 82.
      Lee K, Parish EJ. 2021 Code from: Parameterized neural ordinary differential equations: applications to computational physics problems. The Royal Society. Dataset. (https://doi.org/10.6084/m9.figshare.16557722.v1) Google Scholar