Reduced-order modelling of parameter-dependent, linear and nonlinear dynamic partial differential equation models

In this paper, we develop reduced-order models for dynamic, parameter-dependent, linear and nonlinear partial differential equations using proper orthogonal decomposition (POD). The main challenges are to accurately and efficiently approximate the POD bases for new parameter values and, in the case of nonlinear problems, to efficiently handle the nonlinear terms. We use a Bayesian nonlinear regression approach to learn the snapshots of the solutions and the nonlinearities for new parameter values. Computational efficiency is ensured by using manifold learning to perform the emulation in a low-dimensional space. The accuracy of the method is demonstrated on a linear and a nonlinear example, with comparisons with a global basis approach.


Introduction
Computational modelling is an indispensable tool for analysis, design, optimization and control. For applications that require a high number of model evaluations at different inputs (e.g. uncertainty analysis and inverse parameter estimation) the computational expense of a computer model is often prohibitive. In such cases, the original computer model can be replaced with a surrogate model (or emulator) [1]. The simplest approach to surrogate modelling consists of simplifying the mathematical model or numerical formulation, e.g. by assuming spatial homogeneity or using coarse grids.
The other two main approaches are based on: (i) supervised machine learning methods to learn the model input-output relationship (so-called data-driven of state space. Moreover, the computational cost grows exponentially with the order of the approximating expansion. Recently, a number of hyper-reduction methods have been developed to overcome the limitations of linearization approaches (see also the tensorial POD approach recently developed in [23]). An early method was developed by Astrid et al. [24], based on selecting a subset of the FOM equations corresponding to heuristically chosen spatial grid points, followed by a Galerkin projection of the resulting reduced system onto the POD basis.
The empirical interpolation method interpolates the nonlinear function at selected spatial locations using an empirically derived basis, and is applied directly to the governing PDE [7], while the discrete empirical interpolation method (DEIM) is applicable to general ODE or algebraic systems arising from a spatial discretization [25]. Both methods construct a subspace for the approximation of the nonlinear term and use a greedy algorithm to select interpolation points. An extension of the DEIM [26] generates several local subspaces via clustering and uses classification in the online phase to select one of the subspaces. These approaches can also be used for approximating (vectorized) non-affine system matrices [2]. The Gauss-Newton with approximated tensors method operates at the fully discrete level in space and time, and is based on satisfying consistency and discrete-optimality conditions by solving a residual-minimization problem [27]. This leads to a Petrov-Galerkin (rather than Galerkin) problem with a test basis that depends on the residual derivatives w.r.t. the state variable.
In this paper, we introduce an extension of POD for dynamic, parametrized, linear and nonlinear PDEs. The method we develop involves a computationally efficient approximation of the POD basis and the nonlinearity for new parameter values. It can be used in conjunction with many of the methods described above, e.g. greedy sampling and methods for approximating nonaffine system matrices. In order to avoid inconsistencies and to reduce the loss of information in the construction of new bases, we take the approach of approximating the snapshots rather than the bases or system matrices directly. The snapshots, however, lie in high-dimensional spaces so that direct approximations are computationally unfeasible. We overcome this issue by using manifold learning techniques [28] to map the snapshots to a low-dimensional feature space. We then use Gaussian process emulation (GPE) to infer values of the mapped snapshots for new parameter values, followed by an inverse map to obtain the snapshots in physical space. For nonlinear problems, we extend DEIM by using the same emulation approach to approximate snapshots of the nonlinearity at desired locations in parameter space.
In the next section, we outline the procedures for generating ROMs and POD bases. We provide brief details of the DEIM and explain the issues associated with parametrized and/or nonlinear problems. In §3, we present the snapshot emulation strategy and summarize our approach to linear and nonlinear parametrized problems. In §4, we present one linear and one nonlinear example, comparing the results with a global basis approach.
2. Reduced-order models for parametrized dynamic partial differential equations using proper orthogonal decomposition (a) Problem definition and Galerkin projection Let x = (x 1 , . . . , x L ) denote a point in a bounded, regular domain D ⊂ R L (L = 1, 2, 3), let t ∈ [0, T] denote time and let ξ ∈ X ⊂ R l denote a vector of parameters. For the purposes of illustration, consider a parametrized, parabolic PDE for a dependent variable u(x, t; ξ ): Let H be a separable Hilbert space with inner product (·, ·) H and induced norm ·]| H , e.g. L 2 (D), the space of square integrable equivalence classes of functions with inner product Henceforth, we drop the subscript in the notation for the inner product and norm in L 2 (D). It is assumed that, for each ξ , u ∈ L 2 (0, T; H), i.e. t → u(·, t; ξ ) is a Lebesgue measurable map from (0, T) to H with finite norm u L 2 (0,T;H) := T 0 u(·, t; ξ ) H dt. Then u(·, t; ξ ) ∈ H for each t ∈ (0, T). A spatial discretization of (2.1) leads to a system of ODEs: for a discrete state variable u(t; ξ ) = (u 1 (t; ξ ), . . . , u d (t; ξ )) T , which we call the solution vector. Here d is the number of degrees of freedom, e.g. the number of grid points in a finite-difference (FD) approximation, the number of cells in a cell-centred finite-volume (FV) approximation or the number of nodes (basis functions) in a finite-element (FE) approximation. The matrix A(ξ ) ∈ R d×d arises from the linear term L(ξ )u and f (u(t; ξ ); ξ ) ∈ R d arises from a combination of N (ξ )u, g(x; ξ ) and possibly the boundary conditions. The latter is nonlinear for N (ξ )u ≡ 0. The precise relationship between u(t; ξ ) and u(x, t; ξ ), the forms of A(ξ ) and f (u; ξ ), and the incorporation of boundary conditions depend on the method used. For an FD approximation, problem (2.1) is solved directly and the boundary conditions are incorporated in f (u; ξ ). In an FE approximation a weak form is solved with test functions in H or a dense subspace V of H, with boundary conditions incorporated in f and/or the definition of H. The form of A(ξ ) is determined by the dependence of L(ξ ) on ξ . The simplest case is an affine form: A(ξ ) = i c i (ξ )A i , where the functions c i (ξ ) are known and the matrices A i are constant.
For FD, FV and nodal-basis FE discretizations, the coefficients u i (t; ξ ) of u(t; ξ ) correspond to the values of u(x, t; ξ ) at locations . We will assume this to be the case throughout. A numerical solution of (2.2) yields the solution vector Each of the discrete solutions u i (ξ ) ∈ R d is referred to as a snapshot. For a fixed input ξ ∈ X , a Galerkin projection approximates the problem (2.2) in a proper (lowdimensional) subspace S(ξ ) of R d . Let v j (ξ ) ∈ R d , j = 1, . . . , r, be an orthonormal basis for S(ξ ) (dim(S(ξ )) = r d), where the notation makes explicit the dependence on the input. We seek an approximation u(t; ξ ) ∈ S of u in the space span(v 1 (ξ ), . . . , v r (ξ )) where a = (a 1 (t; ξ ), . . . , a r (t; ξ )) T and V r (ξ ) = [v 1 (ξ ) . . . v r (ξ )]. The Galerkin projection of equations (2.2) onto the basis vectors v i (ξ ), i = 1, . . . , r, yields (replacing u with u) where A r (ξ ) := V r (ξ ) T A(ξ )V r (ξ ) and f r (a(t; ξ ); ξ ) := V r (ξ ) T f (V r (ξ )a(t; ξ ); ξ ). Equations (2.4) represent a system of r ODEs in time for the coefficients a i (t; ξ ). The basic goal of POD (outlined below) is the construction of a basis {v j (ξ )} r j=1 using the snapshots {u i (ξ )} m i=1 .

(b) Proper orthogonal decomposition
POD is presented in a number of ways (e.g. error minimization, 'variance' maximization) in the literature and often under different names. In this section, we provide a brief description of the motivation and practical (discrete) implementation. A complete summary of the underlying theory, alternative approaches, the links between the various interpretations and the optimality of the chosen basis can be found in appendix A. For a fixed ξ ∈ X , POD extracts an 'optimal' basis for a field u(x, t; ξ ), (x, t) ∈ D × [0, T], given an ensemble of 'snapshots' {u(x; t j , ξ )} m j=1 , x ∈ D. These are continuous equivalents of the discrete snapshots u j (ξ ). u(x, t; ξ ) can be regarded as a realization of a stationary (w.r.t. t) random field indexed by (x, t) [3,4,29]. Applying Karhunen-Loèéve (KL) theory [30] for a fixed t yields eigenfunctions (equations (A 1) in appendix A) of an integral operator C with the kernel given by the spatial autocovariance function C(x, x ; ξ ), x, x ∈ D.
In practice, we must work within a finite-dimensional setting. Defining U(ξ ) := [u 1 (ξ ) . . . u m (ξ )], the spatial variance-covariance matrix is given by C(ξ ) = U(ξ )U(ξ ) T ≈ E[u(t; ξ ) u(t; ξ ) T ]. The continuous eigenvalue problem for C can be approximated numerically (nonuniquely) by a principal component analysis (PCA): C(ξ )v i (ξ ) = λ i (ξ )v i (ξ ) for eigenvectors v i (ξ ) ∈ R d and eigenvalues λ i (ξ ) > 0, i = 1, . . . , d, arranged in decreasing order. The first r of these vectors define the space S(ξ ) of §2a. In certain cases, it may be computationally convenient to use variants of POD/PCA to determine the v i (ξ ). In appendix A, we provide details of the method of snapshots and singular value decomposition (SVD), the latter of which we use in practice.

Basis emulation and discrete empirical interpolation method extension
For each input/parameter ξ , the snapshot matrix U(ξ ) is obtained from the FOM and the basis V r (ξ ) is constructed according to §2b. To perform an analysis w.r.t. the inputs, this procedure is computationally prohibitive. A global basis across the parameter space of interest [10] can be constructed by computing a set of snapshot matrices U(ξ j ) for ξ j ∈ X , j = 1, . . . , n. The v i (ξ ) are extracted from a global snapshot matrix [U(ξ 1 ), . . . , U(ξ n )] ∈ R d×nm (usually after an SVD to avoid rank deficiency).
The global basis method uses information only from the 'truth approximation', i.e. the FOM. The optimality of the POD method, on the other hand, is violated since the snapshots used to derive the basis do not pertain to the parameter value of interest (the particular dynamical system under consideration) during the online phase. Furthermore, the range of validity of the global basis could be limited for complex mappings between the parameters and the outputs [13]. Interpolation methods (and the method we propose) violate the truth approximation in the sense that the snapshots or quantities derived therein are not obtained from the original model. In contrast to the global basis, however, these methods attempt to construct more accurate ROMs during the online phase. The main limitation is the accuracy of the interpolation or emulation, which depends on the data available and on the method itself. Moreover, it may not be possible to obtain sharp error bounds using such methods (in cases where the underlying PDE problem is amenable to a rigorous analysis).
Another problem associated with the standard POD-Galerkin approach is that the computational efficiency is compromised when f (·; ξ ) ∈ R d is a strong nonlinearity, since the evaluation of f r in equation (2.4) has a computational complexity that depends on d [31]. The DEIM [25] seeks a set of vectors w i (ξ ) ∈ R d , i = 1, . . . , d, such that the subspace span(w 1 , arranged such that the corresponding eigenvalues decay with i. , the DEIM selects s of the d equations to obtain an 'optimal' solution. Let us introduce the matrix P = [e p 1 . . . e p s ] ∈ R d×s , where e p i is the standard Euclidean basis vector in R d with non-zero entry located at the p i th coordinate. Assuming P T W(ξ ) is non-singular, we obtain assuming that the function f (·; ξ ) acts pointwise. The indices p i ∈ {1, 2, . . . , d}, i = 1, . . . , s are specified by a greedy algorithm [25] that satisfies the following error bound (for a given s): where · is the standard Euclidean norm andf := W(ξ )(P T W(ξ )) −1 P T f is the DEIM approximation of f . This estimate is valid for a given t (considering f as a function of t) by virtue of the second factor on the r.h.s., which is the error in the best 2-norm approximation of f in range(W(ξ )).
In this paper, we introduce a systematic and rigorous method to approximate the local basis and the nonlinearity by first approximating the snapshots for an arbitrary input ξ using Bayesian nonlinear regression. These snapshots lie in very high-dimensional spaces and thus we use a recently developed method that exploits manifold learning to yield a computationally feasible Gaussian process (GP) model. Below we describe the components of this emulation method and subsequently explain how it can be used for a POD analysis of parameterized, dynamic problems.

(a) Formulation and solution of the learning problem
For an arbitrary input ξ , consider the mapping η : X → O ⊂ R md defined below: i.e. a vectorial rearrangement of snapshots {u i (ξ )} m i=1 for the given value of ξ . We can define a similar map The emulation procedure mirrors that described below for the snapshots {u i (ξ )} m i=1 . We aim to approximate the mapping η(·) given training points y j = η(ξ j ) ∈ O (in a highdimensional space) for design points ξ j ∈ X , j = 1, . . . , n. One of the main methods for dealing with such high-dimensional outputs is to define approximate outputs in a q-dimensional subset O q ⊂ O (q md) using PCA and independently emulate the q coefficients of the points in O q for new values of ξ [32]. Shah and co-workers [33,34] extended the latter method by replacing PCA with manifold learning methods, making it applicable to a broader class of output spaces O. In this paper, we employ the method of [33,34] with kernel PCA (kPCA), which is outlined in appendix B, together with an approximation of the inverse map. kPCA [35] defines a map φ q : O → F q , where F q is a q-dimensional feature space. The coordinates z i (y) of points φ q (y) in F q define composite maps from the input space X to R, i.e. z i (ξ ) := z i (η(ξ )), i = 1, . . . , q. We place independent GP priors over these maps, justified by the properties of kPCA.
The approximation of η : X → O given the training points {y j } n j=1 is then substituted for independent approximations of the coefficients z i (ξ ), i = 1, . . . , q, given training data {z i (ξ j ) = z i (η(ξ j ))} n j=1 , which is obtained from equation (B 1) in appendix B. The value of z i (ξ ) for a new input ξ is inferred from scalar GPE (outlined in appendix C) as the mean of a posterior distribution. Given . GPE is exact at the training points if there are no (spurious) errors in the training data. In the present case, an error is introduced in the pre-image map so that the training snapshots will not be recovered exactly. This error, however, is negligible ( §4). We note that the size of md is not a limitation for the manifold learning methods employed in this paper, in which the eigenvalue problems are primarily dependent on the number of training points n.
for nonlinear problems) are obtained using the procedure outlined in §2b for a new input ξ , POD can be performed in the usual manner (with the extended DEIM for nonlinear problems). The entire procedure is outlined in algorithm 1. We mention that kPCA can be replaced with other manifold learning methods, e.g. diffusion maps or Isomap [33,34]. We introduce the terminology 'kGPE-POD' to denote the method of algorithm 1 without the extended DEIM (i.e. steps 1a-7a alone). Similarly, we use the terminology 'kGPE-POD-DEIM' to denote the method of algorithm 1 with the extended DEIM (steps 1a-7a and steps 1b-7b together). 1a: Snapshots from FOM: 1b: Collect nonlinearity snapshots:

Results and discussion (a) Two-dimensional contaminant transport
We consider the transport of a contaminant governed by a convection-diffusion equation. This model can be used, for example, for real-time prediction or for quantifying uncertainty in the concentration to support decision-making [11]. The problem is specified as follows: where u(x, t; ξ ) denotes the contaminant concentration (mol m −3 ), q is the fluid velocity (m s −1 ) and μ is the contaminant diffusion coefficient (m 2 s −1 ). The input ξ is defined below. The initial concentration is given by The magnitude of the velocity field is inversely proportional to the distance from x = (x 1 ,x 2 ) T , where e 1 and e 2 are unit vectors in the x 1 -and x 2 -directions, respectively, and a i ∈ R. To avoid the singularity at x = (x 1 ,x 2 ) T , the norm of velocity is set to zero at this location. We also set a 1 = a 2 = 1 and μ = 1, and consider variations in the input ξ = (x 1 , The problem was discretized in space using a cell-centred FV method with d = 2500 square cells (control volumes). Central differencing was used for the diffusive term and a first-order upwind scheme for the convective term, defining the velocity values on a staggered grid. A fully implicit Euler method was used to solve the resulting semi-discrete linear problem with 100 equal time steps in t ∈ [0, T], T = 0.2 s. A total of 500 inputs ξ j ∈ X , j = 1, . . . , 500, were generated using a Sobol sequence [36]. For each input, the FOM was solved to yield solution vectors (snapshots) u i (ξ j ) ∈ R d , i = 1, . . . , 100, j = 1, . . . , 500. The data points (vectorized snapshots) y j = η(ξ j ), j = 1, . . . , 500, were obtained using equation (3.3). Referring to appendix A, we set H = L 2 (D) to define the POD basis and optimality. Of the 500 data points, n t = 300 were reserved for testing. Training points were selected from the remaining 200 data points (n ≤ 200).  A Gaussian kernel was used for kPCA. The free parameter s 2 was taken to be the average square distance between observations in the original space [37]: Polynomial, multi-quadratic and sigmoid kernels were also tested. The best performance was achieved with the sigmoid and Gaussian kernels. For the inverse mapping, N n = n was used (i.e. all training points). For the GPE, we used a squared exponential covariance function and a zero mean function (after centring). The hyperparameters were found using a maximum-likelihood estimate (MLE) (gradient descent). Errors in the predictions of the vectorized snapshots y j were measured using a normalized error: = y p j − y j / y j , where y p j denotes the prediction of the test point y j = η(ξ j ), j = 1, . . . , n t , using steps 1a-6a of algorithm 1. Errors in the predictions using kGPE-POD/kGPE-POD-DEIM at ξ j were measured using a relative error r , where u p i (ξ j ) is the prediction (steps 1a-7a in algorithm 1) of the test point (snapshot) u i (ξ j ). We first examine the normalized errors in the predictions of the test data points y j = η(ξ j ), j = 1, . . . , n t . Using m = 10 of the snapshots (selecting every 10), figure 1 shows Tukey box plots of for the n t = 300 test cases as the manifold dimension q is increased, using n = 80 training points. Outliers are plotted individually using a '+' symbol. We note that when predicting the training set in this case using q = 10 the maximum value of was around 10 −11 , while the median was around 10 −12 . As a comparison we also include the result for Isomap (replacing kPCA in algorithm 1). The best results were obtained with kPCA, for which the errors converge after q = 6 dimensions (negligible further decrease). Diffusion maps were also tested and gave results similar to kPCA. The same pattern was observed at n = 40, 120 and 200 training points and also for all values of m up to 100. Based on the results, the approximating manifold dimension was set to q = 10 for all values of n and m (using kPCA). Figure 2 compares kGPE-POD with a global basis method for increasing POD dimension r. In the global basis method, the snapshot matrices constituting the global snapshot matrix corresponded to the n = 80 training points used for kGPE-POD. An SVD was performed on the global matrix before extracting the POD basis. For n = 40, the results were similar to the results depicted in figure 2, with a slight decrease in accuracy for both methods. Using m = 10 snapshots, the decrease in the relative errors r in kGPE-POD is negligible for r > 15, while the global basis method continues to improve beyond r = 50. In principle, kGPE-POD uses the correct bases for the test parameter values. It is possible, therefore, that kGPE-POD would approach the true result for a smaller value of r than the global basis approach, which uses a single basis extracted from snapshots that do not pertain to the test parameter values.   For m = 10, kGPE-POD exhibits a minimum r that is lower by more than an order of magnitude, while the maximum r for both methods is roughly the same (0.04 for r ≥ 15). At r = 15 in figure 2a,b, the value of r using kGPE-POD is lower than the minimum r in the global basis method in 109 of the 300 test cases. For the global basis at r = 15, there are 131 cases with an error below the median (3.9 × 10 −3 ), while for kGPE-POD 217 cases have errors below this value. kGPE-POD clearly exhibits a broader range of r values, with a higher median for r > 25. Figure 3 shows histograms of r for the two methods in the case of r = 15, m = 10. The broader range of r is due to the much lower minimum and to the presence of a greater number of cases   In order to assess the generalization accuracy more fully, we considered an uncertainty quantification problem for the accumulated contaminant concentrationū(x; ξ ) := T 0 u(x, t; ξ )dt at the location x c = (0.5, 0.5) T , by considering ξ to be a random vector distributed according to p(ξ ) = N (μ, σ 2 I), where μ = (0.5, 0.5) T and σ 2 = 0.1. The distribution ofū(x c ; ξ ) was estimated using Monte Carlo sampling with N M samples ξ i (this notation is to avoid confusion with the design points) drawn from p(ξ ). We set q = 6, n = 80, N M = 3000 and approximatedū(x c ; ξ ) with a trapezoidal rule. Figure 6 compares the histograms obtained from kGPE-POD, the global basis method and the FOM, using m = 10 snapshots. The FOM took 55.18 h to complete and yielded μ ac = 0.011087 and σ ac = 0.001218, obtained from For r = 10, kGPE-POD exhibited reasonable accuracy with regards to μ ac (within 0.2%) and σ ac (within 8.7%), while the global basis method was inaccurate (50% error in σ ac ). For m = 10, r = 50, both methods were accurate, with kGPE-POD still providing better estimates of μ ac and σ ac . For m = 100, the results are shown in figure 7. kGPE-POD was again more accurate for r = 10, while for r = 30 the two methods exhibited a similar level of accuracy.

(b) Burgers equation
We consider a one-dimensional Burgers equation, with inputs ξ to be defined later: and where u(x, t; ξ ) is the flow velocity, c 1 , c 2 ∈ R, k ∈ N, Re is Reynold's number and g(x) is a source term. We seek a weak solution u(x, t; ξ ) ∈ V := H 1 0 (D) satisfying leads to the weak formulation: find u = u h (x, t; ξ ) ∈ V h such that (4.5) holds ∀v = v h (x) ∈ V h . We also make use of the group (product) approximation [38]: u(x, t; ξ ) 2 ≈ d j=1 u j (t; ξ ) 2 ψ j (x) ∈ V h . Setting u = u h and v h = ψ j in (4.5), we obtain the semi-discrete problem Defining the solution vector u(t; ξ ) = (u 1 (t; ξ ), . . . , u d (t; ξ )) T , equation (4.6) and the initial condition lead to where the (i, j)th elements of the mass and stiffness matrices M and S are given by (ψ i , ψ j ) and (ψ i , ψ j ), respectively, and the jth components of u 0 and g are (u 0 , ψ j ) and (g, ψ j ), respectively. The nonlinear vector function b(u(t; ξ )) arises from the second term in (4.6). We used a Runge-Kutta method with a variable time step to solve the semi-discrete problems in this example. The coefficients u i,j (ξ ), j = 1, . . . , d, of the snapshots u i (ξ ) = u(t i ; ξ ), i = 1, . . . , m, for an arbitrary value of ξ are the nodal coefficients in the FE method solution, and thus correspond to functions For the definition of the POD basis, we chose the L 2 (D) norm for optimality; that is, H = L 2 (D) as defined in appendix A. An FE approximation of the POD basis Another choice for optimality is H = H 0 1 (D) with a(·, ·) as the inner product and associated seminorm |ϕ| 1 = a(ϕ, ϕ) 1/2 . The POD eigenvalue problem T 0 a(u, v)u dt = λv (see appendix A) leads to the eigenvalue problem C(ξ ) T Sv j (ξ ) = λv j (ξ ). The POD basis vectors are then given by v j (ξ ) = S −1/2v j (ξ ), wherev j (ξ ) is an eigenvector of S 1/2 C(ξ )S 1/2 , and are mutually orthogonal w.r.t. ·, · S . In the present example, this approach gave almost identical results.
Analysis of the normalized errors for the n t test cases with n = 160 training points showed convergence after q = 8 dimensions. Isomap gave similar results while Diffusion maps was inferior. We used q = 9 (kPCA) in the results presented below. Figure 8a shows the results of kGPE-POD-DEIM for an increasing r (with s = r). The relative errors converge after r = 30, i.e. further decreases are negligible. Figure 8b compares the predicted velocity profiles at t = 0, 0.5, 1, 1.5, 2, 2.5, 5, 7.5, 10 s from kGPE-POD-DEIM and the FOM for a point ( r ≈ 0.041) above With no approximation of the nonlinearity, a comparison between kGPE-POD and the global basis method exhibited trends similar to those seen in the previous example. For m < 30 and n ≤ 200, kGPE-POD required fewer POD vectors to achieve a given level of accuracy; the lower bound for r at r = 10 was one order of magnitude smaller for kGPE-POD. Both methods improved with increasing m, with the global basis method showing a greater improvement, especially in the lower bound for r . For m = 30 and n = 180 the results are illustrated in figure 9, which shows that around r = 28 both methods exhibit similar levels of accuracy in terms of the maximum, minimum and median r .
Case 2. In a second case, we set g(x) = 0.02e x , k = 3 and c 2 = 0.2, with inputs ξ = (c 1 , Re) T ∈ X = [2, 5] × [10, 1000]. As before we selected 500 inputs using a Sobol sequence and ran the FOM to generate data points, with n t = 300 reserved for testing. In this case, we use d = 128 nodes and  after inspection of the normalized errors we set q = 9. In contrast to the previous case, a large m (m > 120) was required for accurate results. Figure 10 shows the trends in the kGPE-POD-DEIM relative error r on the n t = 300 test points with increasing s for two values of r, using n = 180 and m = 200. For a fixed r, the errors decrease with an increasing s. For a fixed s, the errors were seen to decrease as r was increased up to a certain value. For higher values of r the solutions became less stable, with a corresponding increase in the error. This was more pronounced for small values of s. The optimal distribution of errors (in terms of the median, quartiles and extrema) was achieved for values of s between 5 and 10 higher than the value of r. Similar results for Burgers equation can be found in [39,40].
For r = 30 and s = 40, figure 11a,b compares the FOM and kGPE-POD-DEIM profiles at t = 0, 0.5, 1, 1.5, 2, 2.5, 5, 7.5, 10 s. The first of these corresponds to a point near the median of the relevant box plot in figure 10a, while the second corresponds to a point near the upper whisker. Figure 11c shows the point with the highest error using the same values of r and s. In this case, instability develops as the front forms but eventually settles. Using r = 50 and s = 55, the case with the highest error is shown in figure 11d. In figure 11d, we see that the solutions at early times are more stable. The observed instability is a common feature of POD models [27,41,42]. Stabilization schemes, e.g. alternative inner products, post-processing steps and modification of the underlying model [42][43][44], can be incorporated within the framework we have developed in order to eliminate or minimize such problems.

Conclusion
In this paper, we introduced a new POD-ROM method for dynamic, parameter-dependent linear and nonlinear PDEs. The method uses a Bayesian nonlinear regression to infer the basis for new parameter values and is thus potentially applicable to a broader window of parameter space than existing methods. Compared with a global POD method, our method requires extra computational effort on each run to diagonalize the snapshot matrix. The actual influence of this is small (as the uncertainty quantification in the first example demonstrates) since most of the computational time is spent on solving the ROM. In the examples considered (and others not presented) the global basis method requires a high value of r to reach the same level of performance (in terms of the minimum and maximum relative errors) as our method, particularly for low values of m. At these high values of r, much of the benefits of the global basis method as a surrogate model would be eliminated.
Since the method introduced here is a general framework, a number of modifications could easily be made, e.g. changing the manifold learning or machine learning method, and incorporating stabilization schemes, according to different types of problems. The manifoldlearning-based GP emulator could be treated as a general data-driven technique to interpolate properties other than the snapshots, as has been accomplished with the method of Amsallem & Farhat [14]. For instance, we could employ it to directly learn the POD basis V r (ξ ) in equation ( with corresponding real, positive eigenvalues λ i (ξ ) > λ i+1 (ξ ) ∀i ∈ N. The coefficients satisfy E[a i (t; ξ )] = 0 and E[a i (t; ξ )a j (t; ξ )] = λ i (ξ )δ ij and, since t is arbitrary, they can be interpreted as uncorrelated random processes. The expectation operator E[X] = Ω X(ω)P(dω) is approximated by a time average (obtained from the snapshots), assuming ergodicity. The 'optimality' of the basis {v i (x; ξ )} i∈N can be interpreted in two equivalent ways. For an arbitrary orthonormal basis , ∀r ∈ N, i.e. a generalized 'variance' maximization. Equivalently, we minimize the 'error' L 2 (0,T;L 2 (D)) over orthonormal bases {ϕ i } ∞ i=1 . These results carry over to the finite-dimensional setting described below, in which case orthonormality is defined w.r.t. an inner product in R d . More generally, we seek min {ϕ i } u − r i=1 a i ϕ i i (ξ ) in the desired order. The method of snapshots is used when m d. The temporal autocovariance function K(t, t ; ξ ) = D u(x, t; ξ )u(x, t ; ξ ) dx defines an operator Ka i (t; ξ ) := T 0 K(t, t ; ξ )a i (t ; ξ ) dt . The eigenfunctions a i (t; ξ ) of K are equal to the POD coefficients, and the eigenvalues are identical to those of C. Using E[a i (t; ξ )a j (t; ξ )] = λ i (ξ )δ ij , the POD modes are given by v i (x; ξ ) = (1/λ i (ξ )) T 0 u(x, t; ξ )a i (t; ξ ) dt. The discrete form (in space and time) of the eigenvalue problem is U(ξ ) T U(ξ )a i (ξ ) = λ i a i (ξ ), where K(ξ ) := U(ξ ) T U(ξ ) is a kernel matrix with entries K ij = u i (ξ ) T u j (ξ ), i.e. the space-discrete form of K(t i , t j ; ξ ). The eigendecomposition is K(ξ ) = A(ξ )Λ(ξ )A(ξ ) T , where Λ(ξ ) = diag(λ 1 (ξ ), . . . , λ m (ξ )) and the columns of A(ξ ) are given by the a i (ξ ). The jth component a i,j (ξ ) of a i (ξ ) approximates a i (t j ; ξ ), yielding the discrete-time approximation v i (x; ξ ) = (1/λ i (ξ )) m j=1 u(x, t j ; ξ )a i,j (ξ ), i.e. a linear combination of the snapshots. In the fully discrete case, using the normalization a i (ξ ) → a i (ξ )/ λ i (ξ ), we obtain v i (ξ ) = U(ξ )a i (ξ )/ λ i (ξ ). These relationships are also evident from the SVD of U(ξ ), that is, U(ξ ) = A (ξ )Λ(ξ ) 1/2 V(ξ ) T , where the columns of V(ξ ) are given by the v i (ξ ) and the columns of A (ξ ) are given by the a i (ξ ). In this context, the columns of A (ξ ) and V(ξ ), given, respectively, by the eigenvectors of K(ξ ) and C(ξ ), are referred to as the left and right singular vectors. It is straightforward to show that v i (ξ ) = kU(ξ )a i (ξ ) for k ∈ R. Thus, we recover the earlier relationship by taking k = 1/ λ i (ξ ) to normalize the v i (ξ ).