Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences
Open AccessResearch articles

Oblique projection for scalable rank-adaptive reduced-order modelling of nonlinear stochastic partial differential equations with time-dependent bases

M. Donello

M. Donello

Department of Mechanical Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania, USA

Contribution: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing

Google Scholar

Find this author on PubMed

,
G. Palkar

G. Palkar

Department of Mechanical Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania, USA

Contribution: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

Google Scholar

Find this author on PubMed

,
M. H. Naderi

M. H. Naderi

Department of Mechanical Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania, USA

Contribution: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

Google Scholar

Find this author on PubMed

,
D. C. Del Rey Fernández

D. C. Del Rey Fernández

Department of Applied Mathematics, University of Waterloo, Waterloo, Canada

Contribution: Conceptualization, Formal analysis, Investigation, Writing – review & editing

Google Scholar

Find this author on PubMed

and
H. Babaee

H. Babaee

Department of Mechanical Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania, USA

[email protected]

Contribution: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

Google Scholar

Find this author on PubMed

    Abstract

    Time-dependent basis reduced-order models (TDB ROMs) have successfully been used for approximating the solution to nonlinear stochastic partial differential equations (PDEs). For many practical problems of interest, discretizing these PDEs results in massive matrix differential equations (MDEs) that are too expensive to solve using conventional methods. While TDB ROMs have the potential to significantly reduce this computational burden, they still suffer from the following challenges: (i) inefficient for general nonlinearities, (ii) intrusive implementation, (iii) ill-conditioned in the presence of small singular values and (iv) error accumulation due to fixed rank. To this end, we present a scalable method for solving TDB ROMs that is computationally efficient, minimally intrusive, robust in the presence of small singular values, rank-adaptive and highly parallelizable. These favourable properties are achieved via oblique projections that require evaluating the MDE at a small number of rows and columns. The columns and rows are selected using the discrete empirical interpolation method (DEIM), which yields near-optimal matrix low-rank approximations. We show that the proposed algorithm is equivalent to a CUR matrix decomposition. Numerical results demonstrate the accuracy, efficiency and robustness of the new method for a diverse set of problems.

    1. Introduction

    Discretizations of many time-dependent partial differential equations (PDEs) result in matrix differential equations (MDEs) in the form of dV/dt=F(V), where VRn×s is the solution matrix and F(V)Rn×s is obtained by discretizing the PDE in all dimensions except time. One such example is the uncertainty propagation of random parameters into the PDEs, which requires solving the PDEs for a large number of random realizations [1,2]. Discretization of this problem can be formulated as an MDE, where the rows of the matrix are obtained by discretizing the PDE in the physical domain and the columns of the matrix are samples of the discretized equation for a particular choice of random parameters. For high-dimensional PDEs subject to high-dimensional random parameters, the resulting MDEs can be massive. For example, uncertainty quantification of a three-dimensional time-dependent fluid flow typically requires solving an MDE with nO(106)O(109) grid points (rows) and sO(104)O(107) random samples (columns). Therefore, the solution to these massive MDEs is cost prohibitive due to the floating point operations (flops), memory and storage requirements. The discretization of many other PDEs can also be cast as MDEs, for example, kinetics equations [35], linear sensitivity analyses [6] and species transport equations in turbulent combustion [7].

    For many practical applications, V(t) is instantaneously low-rank. Therefore, low-rank approximations using time-dependent bases (TDBs) have the potential to significantly reduce the computational cost of solving massive MDEs. For these systems, a TDB-based low-rank approximation extracts low-rank structures via TDBs for the column and row spaces of V. A reduced-order model (ROM) is then constructed by projecting the full-order model (FOM) onto the column and row TDBs. Low-rank approximation based on TDB was first introduced in the quantum chemistry field to solve the Schrödinger equation [8], and it is commonly known as the multiconfiguration time-dependent Hartree (MCTDH) method. The MCTDH methodology was later presented for generic MDEs in [2] and is referred to as dynamical low-rank approximation (DLRA).

    Various TDB ROM schemes have also been developed to solve stochastic partial differential equations (SPDEs). Dynamically orthogonal (DO) decomposition [1], bi-orthogonal (BO) decomposition [9], dual dynamically orthogonal (DDO) decomposition [10] and dynamically bi-orthogonal decomposition (DBO) [11] are all TDB-based low-rank approximation techniques for solving stochastic PDEs (SPDEs). In all of these decompositions (DO, BO, DDO and DBO), an evolution equation for the mean field is developed, along with evolution equations for the TDB ROM of the mean-subtracted stochastic fields. Although these decompositions have different forms and constraints, they are all equivalent, i.e. they produce identical low-rank matrices [11,12], and their differences lie only in their numerical performance. TDB ROMs have also been used in other fields and applications including dynamical systems [13,14], combustion [7,15], linear sensitivity analysis [6], dynamical instabilities [1618], deep learning [19] and singular vale decomposition (SVD) estimation for matrices that vary smoothly with a parameter [20].

    Despite the potential of using TDB ROMs to significantly reduce the computational cost of solving massive MDEs, there are still a number of outstanding challenges for most practical problems of interest. We summarize three key challenges below:

    (i)

    Computational efficiency: For specific classes of equations (e.g. homogeneous linear and quadratic nonlinear), rank-r TDB ROMs can be solved efficiently with operations that scale with O(nr) and O(sr) for linear MDEs or scale with O(nr2) and O(sr2) for quadratic MDEs. However, this computational efficiency is lost for general nonlinearities, requiring operations that scale with the size of the FOM, i.e. O(ns).

    (ii)

    Intrusiveness: Even in the special cases of homogeneous linear and quadratic nonlinear equations, efficient implementation of TDB ROM evolution equations is an intrusive process [21, Appendix B]. This involves replacing the low-rank approximation in the FOM, projecting the resulting equation onto the tangent manifold and obtaining low-rank matrices for each term on the right-hand side. The process requires significant effort to derive, implement and debug the code. This poses a major obstacle for most practitioners, creating a significant barrier to adopting the methodology.

    (iii)

    Ill-conditioning: The TDB ROM evolution equations become numerically unstable when the singular values of the low-rank approximation become very small. This is particularly problematic because it is often necessary to retain very small singular values in order to have an accurate approximation. Small singular values lead to ill-conditioned matrices that require inversion in all variations of TDB ROM evolution equations [1,2,911], resulting in restrictive time step limitations for numerical integration and error amplification.

    Although some of these challenges have been tackled, there is currently no methodology that can address all of them. In particular, the problems of ill-conditioning and computational expense must be resolved for practitioners to adopt TDB-based low-rank approximations for MDEs. To address the issue of ill-conditioning, a projector-splitting (PS) time integration was proposed [22], in which arbitrarily small singular values can be retained. However, this scheme includes a backward time integration substep, which is an unstable substep for dissipative problems. To address this issue, an unconventional robust integrator was recently proposed [23] which retains the robustness with respect to small singular values while avoiding the unstable backward step. The authors also presented an elegant rank-adaptive strategy, where the rank of the approximation changes over time to maintain a desired level of accuracy. Despite these advantages, this scheme is first-order in time [23, theorem 4]. In [24], a pseudo-inverse methodology was presented as a remedy to maintain a well-conditioned system. However, in this approach, it is difficult to determine what singular value threshold must be used. Another projection method was presented in [25] that retains robustness with respect to small singular values and can be extended to high-order explicit time discretizations.

    The three time-integration schemes presented in [22,23,25] and the pseudo-inverse methodology presented in [24] can retain O(n+s) cost for linear and quadratic MDEs. But achieving this speedup comes at the expense of a highly intrusive implementation. However, for generic nonlinear MDEs, an intrusive implementation is not possible, and the computational cost of solving the TDB ROMs using methods presented in [2225] scales with O(ns), which is the same as the cost of solving the FOM. Recently, a sparse interpolation algorithm was presented for solving the TDB ROM evolution equations with a computational complexity that scales with O(n+s) for generic nonlinear SPDEs [21]. However, this methodology still lacks robustness when the singular values become small, as it requires the inversion of the matrix of singular values.

    In this work, we present a methodology inspired by interpolation and hyper-reduction techniques developed to accelerate nonlinear ROMs and finite-element models in vector differential equations [2631]. In particular, we present CUR factorizations of low-rank matrices that address the above challenges, i.e. (i) the computational cost of the methodology scales with O(n+s) for generic nonlinear SPDEs both in terms of flops and memory costs, (ii) it lends itself to simple implementation in existing codes and (iii) the time-integration is robust in the presence of small singular values, and high-order explicit time integration can be used. To this end, the main elements of the presented methodology are (i) a time-discrete variational principle for minimization of the residual due to low-rank approximation error and (ii) a CUR factorization based on strategic row and column sampling of the time-discrete MDE.

    The remainder of the paper is organized as follows: In §2, we first review the time-continuous variational principle and its associated challenges. We then present the time-discrete variational principle along with the rank-adaptive sparse sampling strategy for solving TDB ROMs. Finally, we show that the resulting low-rank approximation is equivalent to a CUR factorization and we provide an upper bound on the approximation error. In §3, we demonstrate the method for a toy problem as well as the stochastic Burgers equation and stochastic nonlinear advection–diffusion–reaction (ADR) equation. In §4, we summarize the present work and discuss itsimplications.

    2. Methodology

    (a) Setup

    Consider the nonlinear stochastic PDE given by

    vt=f(v;x,t,ξ),2.1
    augmented with appropriate initial and boundary conditions. In the above equation, v=v(x,t;ξ), x is the spatial coordinate, ξRd are the set of random parameters, t is the time and f(v;x,t,ξ) includes the nonlinear spatial differential operators. We assume generic nonlinear PDEs, where the nonlinearity of f versus v may be non-polynomial, e.g. exponential, fractional, etc. For the sake of simplicity in the exposition, we consider a collocation/strong-form discretization of equation (2.1) in x and ξ. Because of the simplicity of the resulting discrete system, this choice facilitates an uncluttered illustration of the main contribution of this paper, which is focused on the efficient low-rank approximation of nonlinear MDEs. However, the presented methodology can also be applied to other types of discretizations, for example, weak form discretizations (finite element, etc.). Examples of collocation/strong-form discretizations in the spatial domain are Fourier/polynomial spectral collocation schemes or finite-difference discretizations. Example collocation schemes in the random domain include the probabilistic collocation method (PCM) [32] or any Monte Carlo-type sampling methods [3335]. Applying any of the above schemes to equation (2.1) leads to the following nonlinear MDE:
    dVdt=F(t,V),tI=[0,Tf],2.2
    where I=[0,Tf] denotes the time interval, V(t):IRn×s is a matrix with n rows corresponding to collocation points in the spatial domain and s columns corresponding to collocation/sampling points of the parameters ξ and F(t,V):I×Rn×sRn×s is obtained by discretizing f(v;x,t,ξ) in x and ξ. Equation (2.2) is augmented with appropriate initial conditions, i.e. V(t0)=V0. We also assume that boundary conditions are already incorporated into equation (2.2), which can be accomplished in a number of ways, for example by using weak treatment of the boundary conditions [36]. For the remainder of this paper, we will refer to equation (2.2) as the FOM, which will be used as the ground truth for evaluating the performance of the proposed methodology. For the problems targeted in this work, we assume n>s without loss ofgenerality.

    The presented methodology is limited to explicit time integration schemes. For the computational complexity analysis, we consider sparse discretization schemes for spatial discretization, which means that each row is dependent on pa rows, where pan. The majority of discretization schemes, e.g. finite difference, finite volume, finite element, spectral element, result in sparse row dependence. As a result, the computational cost of computing each column of FOM (equation (2.2)) is O(n) and the cost of solving MDE (2.2) for all s columns scales with O(ns). We also note that the presented methodology is not limited to sparse spatial discretizations and can be applied to dense discretizations as well. See remark 2.10 for more details.

    (b) Preliminaries

    In this section, we present some of the definitions of matrix manifolds, tangent spaces, orthogonal and oblique projections and CUR decomposition.

    Definition 2.1. (Low-rank matrix manifolds)

    The low-rank matrix manifold Mr is defined as the set

    Mr={V^Rn×s:rank(V^)=r},
    of matrices of fixed rank r. Any member of the set Mr is denoted by a hat symbol (^), e.g. V^.

    Any member of Mr may be represented by V^=UΣYT, where URn×r and YRs×r are a set of orthonormal columns and ΣRr×r is a rank-r matrix. The rank-r matrix V^ may also be represented via the multiplication of two matrices, i.e. V^=UYT, where URn×r and YRs×r have full column rank.

    Definition 2.2. (Tangent space)

    The tangent space of manifold Mr at V^, represented with the decomposition of V^=UΣYT, is the set of matrices in the form of Koch & Lubich [2]:

    TV^Mr={δUΣYT+UδΣYT+UΣδYT:δUTU=0andδYTY=0},
    where δURn×r and δYRs×r.

    Definition 2.3. (Orthogonal projection onto the tangent space)

    The orthogonal projection of matrix WRn×s onto the tangent space of manifold Mr at V^, represented with the decomposition of V^=UΣYT, is given by Koch & Lubich [2, lemma 4.1]:

    PTV^(W)=UUTW+WYYTUUTWYYT.2.3

    In the above projection, UUT and YYT are orthogonal projections onto spaces spanned by the columns of U and Y. We denote these orthogonal projections with:

    PU=UUTandPY=YYT,2.4
    where the symbol indicates orthogonal projection. In the following, we define oblique projectors. We first explain the notation that is used in this section. Let URn×r and YRs×r be matrices whose columns are orthonormal and let p=[p1,p2,,pr]Nr and s=[s1,s2,,sr]Nr be vectors containing row and column indices, where the number of indices can be greater than or equal to the dimension of the subspaces spanned by U and Y, i.e. rr. Also, rn for row indices and rs for column indices. We use MATLAB indexing where V(p,:)Rr×s selects all columns at the p rows, and V(:,s)Rn×r selects all rows at the s columns of the matrix V. We also use the indexing matrices, P=In(:,p)Rn×r and S=Is(:,s)Rs×r, where In and Is are identity matrices of size n×n and s×s, respectively. It is easy to verify that PTUU(p,:) and STYY(s,:). Let () denote the Moore–Penrose pseudoinverse of a matrix, i.e. A=(ATA)1AT.

    Definition 2.4. (Oblique projection)

    Let URn×r and YRs×r be orthonormal matrices and let pNr and sNr be sets of distinct row and column indices, respectively. Oblique projectors onto Ran(U) and Ran(Y) are defined as [37]

    PU=U(PTU)PTandPY=S(YTS)YT,2.5
    provided (PTU)T(PTU)Rr×r and (YTS)T(YTS)Rr×r are invertible.

    In the above definition, the symbol distinguishes these projectors from orthogonal projectors. For a given matrix ARn×s, PU operates on the left side of the matrix and PY operates on the right side of the matrix. It is easy to verify that (PU)2=PU and (PY)2=PY. It is also easy to verify that the oblique projection of A belongs to the manifold of rank-r matrices, i.e. PUAPYMr.

    For the special case of r=r, the oblique projectors PU and PY are also interpolatory projectors. In this case, the oblique projectors become PU=U(PTU)1PT and PY=S(YTS)1YT. Unlike orthogonal projection or a general oblique projection, the interpolatory projection is guaranteed to match the original matrix at the selected rows and columns, i.e.

    PTPUA=A(p,:)andAPYS=A(:,s).
    The other extreme is when all the rows or columns are selected, i.e. r=n or r=s. Take, for example, the projector PU when r=n. In this case, PU becomes the same as the orthogonal projector, i.e. PUPU. To show this, first note that PU is invariant with respect to the ordering of the row indices (p) and when r=n, p can be taken to be: p=[1,2,,n]. In this case, PIn. Therefore:
    PU=U(PTU)PT=U(UTU)1UT=UUT=PU,
    where we have used the orthonormality condition of U, i.e. UTU=Ir, where Ir is the r×r identity matrix. The analogous relationship exists for PY, when r=s. In the following, we define CUR decompositions, which are closely related to the oblique projections.

    Definition 2.5. (CUR decomposition)

    A CUR decomposition of matrix V is a rank-r approximation of V in the form of VCUR, where CRn×r and RRr×s are actual columns and rows of matrix V, i.e. C=V(:,s) and R=V(p,:). The matrix URr×r is computed such that CUR is a good approximation to V. The CUR of matrix V is denoted with CUR(V).

    Here, the matrices C, U and R are different from the matrices defined in previous sections. Different CUR decompositions can be obtained for the same matrix depending on two factors: (i) the selection of columns and rows and (ii) the method used to compute the matrix U. For more details on CUR decompositions, we refer the reader to Mahoney & Drineas [38]. Finally, it is easy to verify that CUR(V)Mr. The connection between CUR decomposition and oblique projections is shown in §2g.

    (c) Time-continuous variational principle

    The central idea behind TDB-based low-rank approximation is that the bases evolve optimally to minimize the residual due to low-rank approximation error. The residual is obtained by substituting an SVD-like low-rank approximation into the FOM so that V(t) is closely approximated by the rank-r matrix

    V^(t)=U(t)Σ(t)Y(t)T,2.6
    where U(t)Rn×r is a time-dependent orthonormal spatial basis for the column space, Y(t)Rs×r is a time-dependent orthonormal parametric basis for the row space, Σ(t)Rr×r is, in general, a full matrix and rmin(n,s) is the rank of the approximation.

    Because this is a low-rank approximation, it cannot satisfy the FOM exactly and there will be a residual equal to:

    R(t)=d(UΣYT)dtF(t,UΣYT).2.7
    This residual is minimized via the first-order optimality conditions of the variational principle given by
    J(U˙,Σ˙,Y˙)=d(UΣYT)dtF(t,UΣYT)F2,2.8
    subject to orthonormality constraints on U and Y. Since the above variational principle involves the time-continuous equation (i.e. no temporal discretization is applied), the idea is to minimize the instantaneous residual by optimally updating U, Σ and Y in time. Therefore, we refer to this as the time-continuous variational principle. As indicated in [2,7], the optimality conditions of equation (2.8) lead to closed-form evolution equations for U, Σ and Y:
    Σ˙=UTFY,2.9a
    U˙=(IUUT)FYΣ12.9b
    andY˙=(IYYT)FTUΣT,2.9c
    where FRn×s is a matrix defined as F=F(t,UΣYT), and I is the identity matrix of appropriate dimensions. The above variational principle is the same as the Dirac–Frenkel time-dependent variational principle in the quantum chemistry literature [8] or the dynamical low-rank approximation (DLRA) [2]. In [2], the geometry of the tangent space, TV^Mr, was exploited to solve the constrained residual minimization problem given by equation (2.8). In this setting, the residual, I(V^˙)=||V^˙F(t,V^)||F, is minimized with the constraint that V^˙(t)TV^Mr. The solution to the above minimization problem is obtained by
    V^˙=PTV^(F(t,V^)),2.10
    where PTV^ is the orthogonal projection onto the tangent space TV^ at V^=UΣYT [2, Lemma 4.1]. It is easy to show that equations (2.9a)–(2.9c) can be recovered from equation (2.10).

    As was shown in [39], it is possible to derive a similar variational principle for the DO decomposition, V^(t)=UDO(t)YDOT(t), whose optimality conditions are constrained to the orthonormality of the spatial modes, UDOTUDO=I, via the DO condition, U˙DOTUDO=0. However, for the sake of simplicity and unlike the original DO formulation presented in [1], an evolution equation for the mean field is not derived. Without loss of generality, the low-rank DO evolution equations become

    U˙DO=(IUDOUDOT)FYDOC12.11a
    and
    Y˙DO=FTUDO,2.11b
    where C=YDOTYDO is the low-rank correlation matrix. Note that the low-rank approximation based on DO is equivalent to equation (2.6), i.e. UDOYDOT=UΣYT. Similarly, the BO decomposition, V^(t)=UBO(t)YBOT(t), which is subject to BO conditions, UBOTUBO=diag(λ1,,λr) and YBOTYBO=I, is also identical to DO and equation (2.6). As it was shown in [11], one can derive matrix differential equations that transform the factorization {U,Σ,Y} to {UDO,YDO} or {UBO,YBO}. The equivalence of DO and BO formulations was shown in [12]. Using the DO/BO terminology, equations (2.9a)–(2.9c) have both DO and BO conditions, i.e. the DO conditions for U and Y: U˙TU=0 and Y˙TY=0 as well as bi-orthonormality conditions: UTU=I and YTY=I. Despite their equivalence, these three factorizations have different numerical performances in the presence of small singular values. As was shown in [11], equations (2.9a)–(2.9c) outperform both DO and BO.

    Despite the potential of equations (2.9a)–(2.9c) to significantly reduce the computational cost of solving massive MDEs like equation (2.2), there are still a number of outstanding challenges for most practical problems of interest. As highlighted in the Introduction, computing F=F(t,UΣYT) requires O(ns) operations that scale with the size of the FOM. This involves applying the nonlinear map (F) on every column of the matrix V^=UΣYT. While it is possible to achieve O(n+s) for the special cases of homogeneous linear and quadratic nonlinear F, this comes at the expense of a highly intrusive process, that requires a careful term-by-term treatment of the right side of equations (2.9a)–(2.9c) [21, Appendix B]. Furthermore, solving equations (2.9b) and (2.9c) becomes unstable when Σ is singular or near singular. This is particularly problematic because it is often necessary to retain very small singular values in order to have an accurate approximation.

    While the low-rank approximation based on TDBs can be cast in different, yet equivalent formulations, we have chosen equations (2.9a)–(2.9c) over DO/BO/DDO decompositions to highlight the underlying challenges. Since DO/BO/DDO decompositions exhibit all of the above challenges, addressing these challenges in the context of equations (2.9a)–(2.9c) automatically addresses the DO/BO/DDO challenges as well.

    (d) Time-discrete variational principle

    To address the challenges of low-rank approximations based on TDB using the time-continuous variational principle, we consider a time-discrete variational principle for rank-adaptive matrix approximations, which has recently been applied in [40] and also [25,41] in the context of tensors. To this end, consider an explicit Runge–Kutta temporal discretization of equation (2.2):

    Vk=V^k1+ΔtF¯,2.12
    where Δt is the step size and F¯ is obtained via an explicit Euler or Runge–Kutta scheme. For example, the first-order explicit Euler method is given by: F¯=F(tk1,V^k1). In the above equation, it is important to note that Vk is not the FOM solution since the right-hand side is computed using the low-rank state from the previous time step. Despite using the rank-r V^k1 in equation (2.12), Vk will not be a rank-r matrix, i.e. VkMr. Excluding some rare exceptions, taking one step according to equation (2.12) will put Vk off the rank-r manifold. Therefore, to solve the MDE while remaining on Mr, a rank truncation is needed to map the solution back onto the rank-r manifold at each time step. In other words, we need to approximate Vk with a rank-r matrix, V^k, such that
    Vk=V^k+Rk,2.13
    where Rk is the low-rank approximation error.

    The time-discrete variational principle can be stated as finding the best V^kMr such that the Frobenius norm of the residual is minimized [25]:

    Z(V^k)=VkV^kF2.2.14
    The solution of the above residual minimization scheme is obtained via
    V^bestk=SVD(Vk),2.15
    where SVD(Vk)=UbestkΣbestkYbestkT is the rank-r truncated SVD of matrix Vk, where UbestkRn×r and YbestkRs×r are the matrices of the first r left and right singular vectors of Vk, respectively and ΣbestkRr×r is the matrix of singular values.

    An important advantage of equation (2.15) over equations (2.9a)–(2.9c) is that the time advancement according to equation (2.15) does not become singular in the presence of small singular values. While this solves the issue of ill-conditioning, computing equation (2.15) at each iteration of the time stepping scheme is cost prohibitive. This computational cost is due to two sources: (i) computing the nonlinear map F(tk1,V^k1) to obtain Vk and (ii) computing SVD(Vk). The cost of (i) alone makes the solution of the time-discrete variational principle as expensive as the FOM, i.e. O(ns). Besides the flops cost associated with computing Vk, the memory cost of storing Vk is prohibitive for most realistic applications. On the other hand, computing the exact SVD of Vk scales with min{O(n3),O(s3)}. While this cost is potentially alleviated by fast algorithms for approximating the SVD, e.g. randomized SVD [42] or incremental QR [37], for general nonlinearities in F, (i) is unavoidable. This ultimately leads to a computational cost that exceeds that of the FOM.

    (e) Low-rank approximation via an oblique projection

    To overcome these challenges, we present an oblique projection scheme that enables a cost-effective approximation to the rank-r SVD(Vk). Before presenting our methodology, we provide a geometric interpretation of SVD(Vk). In particular, SVD(Vk) can be interpreted as an orthogonal projection onto the column and row space of V^bestk as shown below:

    V^bestk=PUbestkVkPYbestk=PUbestk(V^k1+ΔtF¯)PYbestk,2.16
    where PUbestk=UbestkUbestkT and PYbestk=YbestkYbestkT are orthogonal projections onto the column and row space of V^bestk, respectively. The approximation V^bestk is the optimal rank-r approximation of Vk, however as mentioned in the previous section, computing V^bestk is more expensive than solving the FOM.

    In the following, we present a methodology that computes an accurate approximation to V^bestk in a cost-effective manner. From the geometric perspective, our approach is to use an oblique projection onto a set of rank-r orthonormal column (Uk) and row (Yk) subspaces:

    V^k=PUkVkPYk=PUk(V^k1+ΔtF¯)PYk.2.17
    The above equation is analogous to equation (2.16) where the orthogonal projections are replaced with oblique projections. While orthogonal projection requires access to the entire V^k1+ΔtF¯ matrix, the oblique projectors can be designed to require the computation of V^k1+ΔtF¯ at O(r) columns and rows. This geometric perspective is depicted in figure 1a.
    Figure 1.

    Figure 1. Schematic of the TDB-CUR methodology. (a) A geometric representation of the methodology depicting the departure from the rank-r manifold at V^k1 to Vk (red). Two possible mappings that truncate Vk back to the rank-r manifold are shown: V^bestk (green) computed via orthogonal projection (SVD) and V^k (purple) computed via oblique projection (TDB-CUR). The error of V^bestk (dashed green line) is orthogonal to the tangent space TV^bestkMr at V^bestk, while the error of V^k (dashed purple line) is not. (b) The low-rank approximation, V^k, can be computed such that the residual at the selected rows (red) and columns (blue) is equal to zero. This is accomplished via sparse interpolation of the selected rows and columns. (c) Using interpolatory projection, the resulting V^k is an approximation to the rank-r truncated SVD(Vk), and is equivalent to the low-rank CUR factorization that interpolates the selected rows and columns of Vk. Although it is equivalent to this CUR factorization, the numerical computation of V^k is different, as it does not require inverting Vk(p,s).

    From the matrix decomposition point of view, the above approximation may be represented via a CUR decomposition:

    V^k=CUR(Vk),2.18
    where CUR represents the algorithmic implementation of a CUR decomposition. See figure 1c.

    From the residual minimization perspective, the SVD can be viewed as a Galerkin projection where ||Rk||F is minimized. On the other hand, the presented approach based on interpolatory projection (a special case of oblique projection) can be viewed as a collocated scheme where the residual is set to zero at r strategically selected rows and columns of the residual matrix, Rk, in equation (2.13). To this end, we present an algorithm to set Rk(p,:)=0 and Rk(:,s)=0, where pNr and sNr are vectors containing the row and column indices at which the residual is set to zero. This simply requires V^k(p,:)=Vk(p,:) and V^k(:,s)=Vk(:,s). See figure 1b. In §2h, we consider oblique projection for the general case when r>r (definition 2.4), where the residual at the selected rows and columns is not guaranteed to be zero.

    Although the approach we will present is equivalent to the oblique projection of equation (2.17), Uk and Yk are the unknown column and row bases of V^k at the current time step. Therefore, equation (2.17) cannot be readily used for the computation of V^k. In the following, we present a methodology to compute these bases (and subsequently V^k) by strategically sampling r columns and rows of Vk. While there are many possible choices for the indices p and s, selecting these points should be done in a principled manner, to ensure the residual at all points remains small. To compute these points, we use the discrete empirical interpolation method (DEIM) [43] which has been shown to provide near-optimal sampling points for computing CUR matrix decompositions [37]. A similar approach was recently applied in [21] to accelerate the computation of equations (2.9a)–(2.9c), by only sampling F at a small number of rows and columns. However, the approach presented in [21] still suffers from the issue of ill-conditioning.

    To compute the DEIM points, the rank-r SVD (or an approximation) is required [37]. Since we do not have access to the rank-r SVD at the current time step, k, we use the approximation of the SVD from the previous time step, V^k1=Uk1Σk1Yk1T, to compute the DEIM points. The initial approximation is ideally obtained from the FOM initial condition as the rank-r SVD(V0). The algorithm for computing V^k using interpolation is as follows:

    (i)

    Compute the sampling indices, pDEIM(Uk1), and sDEIM(Yk1), in parallel.

    (ii)

    Compute Vk(p,:) and Vk(:,s) by taking one step according to equation (2.12) at the selected rows and columns, in parallel.

    (iii)

    Compute QRn×r as the orthonormal basis for the range of Vk(:,s) by QR decomposition such that Vk(:,s)=QR, where RRr×r.

    (iv)

    Interpolate every column of Vk onto the orthonormal basis Q at sparse indices p:

    Z=Q(p,:)1Vk(p,:),2.19
    where ZRr×s is the matrix of interpolation coefficients such that QZ interpolates Vk onto the basis Q at the interpolation points indexed by p.

    (v)

    Compute the SVD of Z so that

    Z=UZΣkYkT,2.20
    where UZRr×r, ΣkRr×r and YkRs×r.

    (vi)

    Compute UkRn×r as the in-subspace rotation:

    Uk=QUZ.2.21

    In Step (i), the details of the DEIM algorithm can be found in [43, algorithm 1]. A DEIM algorithm based on the QR factorization, a.k.a QDEIM, may also be used [44,45]. Both DEIM and QDEIM are sparse selection algorithms and they perform comparably in the cases considered in this paper. We explain here how the above algorithm addresses the three challenges mentioned in the Introduction (1).
    (i)

    Computational efficiency: The above procedure returns the updated low-rank approximation V^k=QZ=UkΣkYkT, and only requires sampling Vk at r rows and columns. This alone significantly reduces both the required number of flops and memory, compared to computing the entire Vk. Furthermore, instead of directly computing the SVD of the n×s matrix Vk, we only require computing the QR of the n×r matrix Vk(:,s), and the SVD of the r×s matrix Z. This reduces the computational cost to O(s+n) for rs and rn. Moreover, in most practical applications, computing Vk(:,s) is the costliest part of the algorithm, which requires solving s samples of the FOM. However, since these samples are independent of each other, the columns of Vk(:,s) can be computed in parallel. Similarly, each row of Vk(p,:) can be computed in parallel.

    (ii)

    Intrusiveness: While this significantly reduces the computational burden, perhaps an equally important outcome is the minimally intrusive nature of the above approach. For example, when the columns of Vk are independent, e.g. random samples, Vk(:,s) can be computed by directly applying equation (2.12) to the selected columns of the low-rank approximation from the previous time step. This effectively allows for existing numerical implementations of equation (2.12) to be used as a black box for computing Vk(:,s). The non-intrusive column sampling in the presented algorithm is the counterpart of solving equation (2.9b) in DLRA and equation (2.11a) in DO. However, equations (2.9b) and (2.11a) require deriving and implementing new PDEs, whereas the presented algorithm allows an existing deterministic solver to be used in a black box fashion, in which a suitable column space basis is extracted. On the other hand, the rows of Vk are in general dependent, based on a known map for the chosen spatial discretization scheme, e.g. sparse discretizations like finite difference, finite element, or dense discretization schemes, e.g. global spectral methods. Therefore, computing Vk(p,:) does require specific knowledge of the governing equations, namely the discretized differential operators. Based on the discretization scheme, one can determine a set of adjacent points, pa, that are required for computing the derivatives at the points specified by p. While this introduces an added layer of complexity, this is much less intrusive than deriving and implementing reduced-order operators for each term in the governing equations; which we emphasize again, is only feasible for homogeneous linear or quadratic nonlinear equations. In the present work, that bottleneck is removed, regardless of the type of nonlinearity.

    (iii)

    Ill-conditioning: The presented algorithm is robust in the presence of small or zero singular values. First note that the inversion of the matrix of singular values is not required in the presented algorithm. In fact, the conditioning of the algorithm depends on Q(p,:) and Y(s,:), and the DEIM algorithm ensures that these two matrices are well-conditioned. To illustrate this point, let us consider the case of overapproximation where the rank of Vk(:,s) is r1<r. In this case, equations (2.9a)–(2.9c) and equations (2.11a)–(2.11b) cannot be advanced because ΣRr×r and CRr×r will be singular, i.e. rank(Σ)=rank(C)=r1<r. On the other hand, despite Vk(:,s) being rank-deficient, Q will still be a full rank matrix in the presented algorithm. While there is no guarantee that a subset of rows of Q, i.e. Q(p,:) is well-conditioned, the DEIM is a greedy algorithm that is designed to keep ||Q(p,:)1|| as small as possible in a near-optimal fashion. In §2h, we show that oversampling further improves the condition number of the presented algorithm, and in theorem 2.8, we show that ||Y(s,:)1|| plays an equally important role in maintaining a well-conditioned algorithm.

    As we will show in §2g, the low-rank approximation computed above is equivalent to a CUR matrix decomposition that interpolates Vk at the selected rows and columns. Therefore, we refer to the above procedure as the TDB-CUR algorithm.

    (f) Computing Vk(p,:)

    Up until this point, we have considered Vk=V^k1+ΔtF¯ to be an n×s matrix resulting from an explicit Runge–Kutta temporal discretization of equation (2.12). We showed that sparse row and column measurements, Vk(p,:) and Vk(:,s), could be used to efficiently compute an approximation to the rank-r SVD of Vk. While Vk(:,s) is straightforward to compute for independent random samples, as discussed in §2e, computing Vk(p,:) depends on a set of adjacent points, pa, according to the spatial discretization scheme. As a result, for higher-order integration schemes, special care must be taken in the computation of Vk(p,:). To demonstrate this, we consider the second-order explicit Runge–Kutta scheme where

    F¯=F(tk1+12Δt,V^k1+12ΔtF(tk1,V^k1)).
    After determining the row indices, p and pa, Vk(p,:) can be computed as follows:
    (i)

    Compute V^k1([p,pa],:)=Uk1([p,pa],:)Σk1Yk1T.

    (ii)

    Compute the first stage F1=F(tk1,V^k1) at the p rows as

    F1(p,:)=F(tk1,V^k1([p,pa],:)).
    Note, if the explicit Euler method is used, F¯=F1, and no additional steps are required. Simply compute Vk(p,:)=V^k1+ΔtF1(p,:). If a higher-order scheme is used, proceed with the following steps.

    (iii)

    The final stage of the second-order integration scheme requires taking a half step to evaluate F at the midpoint:

    F2=F(tk1+12Δt,V^k1+12ΔtF1).
    Note that for the second-order scheme, F¯=F2. Here, we require F2(p,:), given by
    F2(p,:)=F(tk1+12Δt,V^k1([p,pa],:)+12ΔtF1([p,pa],:)).
    Note that we now require F1(pa,:) to evaluate the above expression. While this can be computed according to Step (ii), where pa will have its own set of adjacent points paa, this process quickly gets out of hand, especially as more stages are added to the integration scheme. As a result, for higher-order schemes, the efficiency afforded by the presented algorithm will deteriorate, and the resulting implementation will become increasingly complex. To overcome these challenges, we instead compute the low-rank approximation F^iFi, using the sparse row and column measurements, Fi(p,:) and Fi(:,s), which are already required for computing Vk(p,:) and Vk(:,s). Here, the subscript denotes the ith stage of the integration scheme. The first step is to compute UFi as an orthonormal basis for the Ran(Fi(:,s)), using QR. Next, compute the oblique projection of Fi onto UFi, such that F^i=UFiUFi(p,:)1Fi(p,:). Using this low-rank approximation, Fi(pa,:) is readily approximated by F^i(pa,:)=UFi(pa,:)UFi(p,:)1Fi(p,:). Although we have considered the second-order Runge–Kutta method in the example above, this approach is easily extended to higher-order Runge–Kutta methods. It is straightforward to show that the above procedure is equivalent to a CUR decomposition of matrix Fi, similar to our previous work [21].

    (g) Equivalence to a CUR decomposition and oblique projection

    Before presenting the details of our methodology in §2e, we discuss how the presented approach can be understood as an oblique projection (equation (2.17)) or alternatively as a CUR decomposition (equation (2.18)). In this section, we show that (i) the presented algorithm is equivalent to a CUR decomposition (theorem 2.6) and (ii) the matrix V^k is obtained via an oblique projection, which requires access to only the selected rows and columns of Vk (theorem 2.7). In theorems 2.6 and 2.7, P and S are matrices of size n×r and s×r, respectively. We drop the time step k for simplicity.

    Theorem 2.6.

    Let V^=QZ be the low-rank approximation of V computed according to the TDB-CUR algorithm. Then: (i) V^=QZ is equivalent to the CUR factorization given by (VS)(PTVS)1(PTV). (ii) The low-rank approximation is exact at the selected rows and columns, i.e. PTV^=PTV and V^S=VS.

    Proof.

    (i)

    According to the TDB-CUR algorithm, Q is a basis for the Ran(V(:,s)). Therefore, V(:,s)=VS=QQTVS, and it follows that PTVS=PTQQTVS. Substituting this result into the CUR factorization gives

    (VS)(PTVS)1(PTV)=QQTVS(PTQQTVS)1PTV.
    Rearranging the above expression gives the desired result
    (VS)(PTVS)1(PTV)=QQTVS(QTVS)1(PTQ)1PTV=QZ=V^,
    where we have used Z=(PTQ)1PTV=Q(p,:)1V(p,:), from equation (2.19).

    (ii)

    Using the above result, V^=(VS)(PTVS)1(PTV), we show the selected rows of V^ are exact, i.e. V^(p,:)=V(p,:):

    PTV^=(PTVS)(PTVS)1(PTV)=PTV.
    Similarly for the columns,
    V^S=(VS)(PTVS)1(PTVS)=VS.
    This completes the proof.

    Now we show that V^ is an oblique projection of V onto the selected columns and rows of V. In particular, the oblique projector involved is an interpolatory projector. For the sake of brevity, we drop the superscript k in the following.

    Theorem 2.7.

    Let V^=QZ be the low-rank approximation of V computed according to the TDB-CUR algorithm. Then V^=PUVPY where PU and PY are oblique projectors onto Ran(U) and Ran(Y), respectively, according to equation (2.5).

    Proof.

    We first show that PU can be represented versus Q as the interpolation basis. To this end, replacing U=QUZ in the definition of PU results in:

    PU=U(PTU)1PT=QUZ(PTQUZ)1PT=QUZUZ1(PTQ)1PT=Q(PTQ)1PT.
    where we have used the fact that UZ is a square orthonormal matrix and therefore, UZUZ1=I. Similarly, PY can be represented versus ZT as the interpolation basis by replacing YT=Σ1UZ1Z in PY:
    PY=S(YTS)1YT=S(Σ1UZ1ZS)1Σ1UZ1Z=S(ZS)1Z.
    Using these projection operators we have
    PUVPY=Q(PTQ)1PTVS(ZS)1Z.2.22
    Using the results of theorem 2.6, Part (ii), we have: V(p,s)=V^(p,s). Therefore:
    PTVS=V(p,s)=V^(p,s)=Q(p,:)Z(:,s)=PTQZS.
    Using this result in equation (2.22) yields:
    PUVPY=Q(PTQ)1PTQZS(ZS)1Z=QZ=V^.
    This result completes the proof.

    In the following theorem, we show that the oblique projection error is bounded by an error factor multiplied by the maximum of orthogonal projection errors onto U or Y. We follow a similar procedure to that used in [37], however, in [37] the CUR is computed based on orthogonal projections onto the selected columns and rows, whereas in the presented TDB-CUR algorithm, oblique projectors are used. Without loss of generality, we consider a generic oblique projection, where the indexing matrices P and S are of size n×r and s×r, respectively, and in general, rr (see definition 2.4 for details). In the following, we use the second norm (||||||||2).

    Theorem 2.8.

    Let PU and PY be oblique projectors according to definition 2.4 and let URn×r and YRs×r be a set of orthonormal matrices, i.e. UTU=I and YTY=I. Let ϵf0 be given by: ϵf=min{ηp(1+ηs),ηs(1+ηp)}1, where ηp=||(PTU)|| and ηs=||(STY)|| and σ^r+1=max{||VUUTV||,||VVYYT||}. Then the error of the oblique projection is bounded by

    ||VPUVPY||(1+ϵf)σ^r+1.2.23

    Proof.

    First note that ϵf0 because ηp1 and ηq1. The error matrix can be written as

    VPUVPY=(IPU)V+PUVPUVPY=(IPU)V+PUV(IPY)
    where I is the identity matrix of appropriate size. Also, PU=U(PTU)1PTU=U. Therefore,(IPU)U=0. Similarly, YT(IPY)=0. Therefore,
    ||VPUVPY||||(IPU)V||+||PUV(IPY)||=||(IPU)(VUUTV)||+||PU(VVYYT)(IPY)||(||(IPU)||+||PU||||(IPY)||)σ^r+1=ηp(1+ηs)σ^r+1.
    In the above inequality, we have made use of the fact that ||IPU||=||PU||=ηp and ||IPY||=||PY||=ηs as long as the projectors are neither null nor the identity [46]. In the second line of the above inequality, we have made use of (IPU)U=0 and YT(IPY)=0. Similarly, it is possible to express the error matrix as
    VPUVPY=V(IPY)+VPYPUVPY=V(IPY)+(IPU)VPY.
    Therefore, another error bound can be obtained as
    ||VPUVPY||||V(IPY)||+||(IPU)VPY||=||(VVYYT)(IPY)||+||(IPU)(VUUTV)PY||(||(IPY)||+||IPU||||PY||)σ^r+1=ηs(1+ηp)σ^r+1,
    where ||IPY||=ηs is used. Combining the above two inequalities yields inequality (2.23).

    In the above error bound, when U and Y are the r most dominant exact left and right singular vectors of V, then σ^r+1=σr+1, where σr+1 is the r+1th singular value of V, since

    ||VUUTV||=||VVYYT||=σr+1.2.24
    In that case, 1+ϵf is the error factor of the CUR decomposition when compared against the optimal rank-r reduction error obtained by SVD. As demonstrated in our numerical experiments, the TDB-CUR algorithm closely approximates the rank-r SVD approximation of V.

    (h) Oversampling for improved condition number

    The above error analysis shows that the CUR rank-r approximation can be bounded by an error factor ϵf times the maximum error obtained from the orthogonal projection of V onto U or Y. This analysis reveals that better conditioned PTU and STY matrices result in smaller ηp and ηs, which then results in smaller error factor ϵf. In the context of DEIM interpolation, it was shown that oversampling can improve the condition number of oblique projections [47]. The authors demonstrated that augmenting the original DEIM algorithm with an additional m=O(r) sampling points can reduce the value of ηp, leading to smaller approximation errors. This procedure of sampling more rows than the number of basis vectors leads to an overdetermined system where an approximate solution can be found via a least-square regression rather than interpolation. Additionally, it was shown in [48] that for matrices with rapidly decaying singular values (as targeted in this work), oversampling improves the accuracy of CUR decompositions.

    In the following, we extend the TDB-CUR algorithm for row oversampling. As a direct result of the oversampling procedure, the oblique projection of V onto the range of the orthonormal basis Q becomes Z=Q(p,:)V(p,:), where pNr contains the r=r+mn row indices. Note that Q(p,:)=(Q(p,:)TQ(p,:))1Q(p,:)T is the pseudoinverse of Q(p,:), however, we do not apply any singular value threshold cutoff to compute Q(p,:) and exact inversion of Q(p,:)TQ(p,:) is used. Therefore, the oblique projection becomes a least squares best-fit solution. Also, increasing the number of oversampling points decreases ηp=||Q(p,:)|| and it follows that for the maximum number of oversampling points, i.e. when all the rows are sampled, the orthogonal projection of every column of V onto Ran(V(:,s)) is recovered, where ηp attains its smallest value, which is ηp=1. Note that, unlike the interpolatory projector, PTPUAA(p,:). The oversampling is also applied analogously to the low-rank approximation of F: F^i=UFiUFi(p,:)Fi(p,:) where UFi(p,:)=(UFi(p,:)TUFi(p,:))1UFi(p,:)T. The low-rank approximation of F is presented in §2f.

    We refer to the above sampling procedure as OS-DEIM, where OS refers to the oversampling algorithm. Since the DEIM only provides sampling points equal to the number of basis vectors, we use the GappyPOD+E algorithm from Peherstorfer et al. [49] to sample a total of r rows. While any sparse selection procedure can be used, the GappyPOD + E was shown to outperform other common choices like random sampling or leverage scores [38]. Finally, it is possible to oversample the columns in an analogous manner to decrease ηs. In all of the examples considered in this paper, we apply row oversampling, but ultimately the decision for row oversampling, column oversampling, or both may be made by requiring that ηp and ηs be smaller than some threshold values.

    (i) Rank-adaptivity

    In order to control the error while avoiding unnecessary computations, the rank of the TDB must be able to adapt on the fly. The importance of rank-adaptivity for low-rank approximation with TDB has been recognized and several algorithms have been proposed recently. See for example [5052]. We show that it is easy to incorporate mode adaptivity into the TDB-CUR algorithm. In the case of rank reduction, once the new rank is chosen, such that rk<rk1, the low-rank matrices are simply truncated to retain only the first rk components, i.e. U(:,1:rk), Σ(1:rk,1:rk) and Y(:,1:rk). On the other hand, the rank can be increased, such that rk>rk1, by sampling more columns (rk) than the number of basis vectors (rk1), i.e. oversampling. Similar to the procedure used for oversampling the rows, the column indices are determined via the GappyPOD+E algorithm. While this provides a straightforward approach for how to adapt the rank, it does not address when the rank should be adapted, or what that new rank should be.

    Informed by the error analysis from the preceding section, we devise a suitable criterion for controlling the error via rank addition and removal. Since it is not possible to know the true error without solving the expensive FOM, we devise a proxy for estimating the low-rank approximation error:

    ϵ(t)=σ^r(t)(i=1rσ^i(t)2)1/2,2.25
    where σ^i are the singular values of the low-rank approximation from the previous time step. Assuming the low-rank approximation is near-optimal in its initial condition, we can use the trailing singular value as a proxy for the low-rank approximation error.

    To make the error proxy more robust for problems of varying scale and magnitude, we divide by the Frobenius norm of V^, where it is well known that ||V^||F=(i=1rσ^i2)1/2. Rather than set a hard threshold, we add/remove modes to maintain ϵ within a desired range, ϵlϵϵu, where ϵl and ϵu are user-specified lower and upper bounds, respectively. If ϵ>ϵu we increase the rank to r+1, and if ϵ<ϵl we decrease the rank to r1. As a result, this approach avoids the undesirable behaviour of repeated mode addition and removal, which is observed by setting a hard threshold. The rank-adaptive TDB-CUR algorithm is detailed in algorithm 1.

    It is important to note that this is not the only criterion for mode addition and removal, and one can devise a number of strategies based on the problem at hand. However, from our numerical experiments, this approach has proved to be simple and effective, and it does a good job at capturing the trend of the true error. For more details on estimating rank and selection criteria, we refer the reader to [53, section 2.3]. Finally, it is possible to increase the rank by more than one in algorithm 1, if required. This can be determined by applying the singular value threshold check after executing Line 19. If ϵ>ϵu is still true, one more column can be sampled. This requires executing Line 10 to find the new column index, Line 11 to update p, evaluating Vk only for the new column using Line 14, and following Lines 15–20. These iterations can be carried out many times, until ϵ falls below ϵu.

    Inline Graphic

    Remark 2.9.

    Algorithm 1 is presented for solving MDEs that arise from discretizing PDEs with parametric uncertainties, where the rows are dependent on each other but columns can be solved independently. However, algorithm 1, with minor modification, can be applied to MDEs where the columns are also dependent on each other. In such cases, evaluating Vk(:,s) requires providing V^k1(:,[s,sa])=Uk1Σk1Yk1([s,sa],:)T, where sa is the set of column indices, whose values are needed to compute Vk(:,s).

    Remark 2.10.

    Algorithm 1 can be applied to problems with dense spatial discretizations, where pa=[1,2,n]. These MDEs can arise, for example, from global discretization methods such as spectral methods. In the most generic form, the computational complexity of computing each entry of the right-hand side matrix F can be O(nαsβ) for some α,β0. The computational complexity of solving FOM is ns times the computational complexity of each entry, i.e. O(nsnαsβ) or O(n(α+1)s(β+1)). The presented algorithm reduces the cost of evaluating the FOM for this generic setting to O((n+s)nαsβ).

    For MDEs arising from the discretization of PDEs with parametric uncertainties, β=0, and when sparse discretization schemes are used for the spatial discretization, α=0. However, when global discretization methods are used, α=1. Take for example, F(t,V)=DV, where DRn×n is a full matrix obtained from discretization of linear differential operators. For this problem, the cost of solving FOM scales with O(n2s), while the cost of solving TDB-CUR scales with O(n2+ns). The toy problem presented in §3a is a demonstration of a case where there is a dense coupling between both columns and rows.

    3. Demonstrations

    (a) Toy problem

    As our first example, we compare the accuracy of the presented algorithm against DLRA using standard integrator [2], DO [1], the PS time integrator [22] and the recently proposed unconventional robust integrator [23]. We emphasize that it is already established that the standard integrator, for example, Runge–Kutta schemes, are unstable for solving equations (2.9a)–(2.9c), which has motivated the development of new time integration techniques [22,23]. We consider the time-dependent matrix from Ceruti & Lubich [23] given explicitly as

    V(t)=(etW1)etD(etW2)T,0t1.
    The matrices W1Rn×n and W2Rn×n are randomly generated skew-symmetric matrices as follows: W1=(W~1W~1T)/2 and W2=(W~2W~2T)/2, where W~1Rn×n and W~2Rn×n are uniformly distributed random matrices. The matrix DRn×n is diagonal with diagonal entries di=1/2i for i{1,2,,n}. We choose n=100 and final time Tf=1. We create a linear MDE: dV/dt=W1V+V+VW2T, where the right-hand side is linearly dependent on V. We use the explicit fourth-order Runge–Kutta integrator for all of the methods including the substeps of the unconventional robust integrator [23]. We use relative Frobenius error Ek=||V^kVk||F/||Vk||F in our analysis.

    In figure 2a, we plot the error at the final time E(Tf) versus rank for two different step-sizes: Δt1=101 and Δt2=103. We consider the following cases: TDB-CUR (DEIM), TDB-CUR (OS-DEIM) with m=10, DO [1] (equations (2.11a)–(2.11b)), DLRA using standard integrator [2] (equations (2.9a)–(2.9c)), and PS [22]. For reference, we also show the optimal error that is obtained via the rank-r SVD of the exact solution at the final time Tf, denoted by SVD(V(Tf)). For Δt1, both DO and DLRA diverge before reaching r=10. This is because the matrix Σ in DLRA and the matrix C in DO become poorly conditioned as r increases. However, PS, TDB-CUR (OS-DEIM) and TDB-CUR (OS-DEIM) follow the optimal error until the temporal integration error dominates, at which point the error cannot be reduced further by increasing r. It is worth noting that without oversampling, TDB-CUR has a sudden increase in error at around r=25. However, this undesirable behaviour is eliminated by oversampling. As the time step is reduced to Δt2, we observe a corresponding decrease in the PS, TDB-CUR (DEIM) and TDB-CUR (OS-DEIM) errors. Although DO and DLRA still diverge for the smaller time step, this occurs at a much larger value of r. Thus, figure 2a also highlights the severe time step restrictions for the stability of DO and DLRA in the presence of small singular values.

    Figure 2.

    Figure 2. Toy problem: (a,b) Error at the final time E(Tf) versus reduction order r for step-sizes Δt1=101 and Δt2=103 (c) l2 norm of the inverse matrix versus reduction order r (d) E(Tf) versus step size Δt for various reduction orders r=8,16,32. (a) Full rank V(t), (b) rank-deficient V(t), (c) full rank V(t) and (d) full rank V(t).

    In figure 2b, we consider a rank-deficient matrix and overapproximation using different low-rank techniques. Specifically, we consider the matrix DRn×n, where the diagonal entries are given by di=1/2i for i1,,5, and all remaining entries are zero. Our results show that the error drops to the optimal temporal error when r=5. This is because the rank of the matrix is 5. Furthermore, we observe that even in the case of rank deficiency, TDB-CUR remains stable for r>5. This finding supports the observation made in §2e regarding the conditioning of the presented algorithm. Specifically, even when r>5, the matrix Q(p,:) is well-conditioned, and the TDB-CUR scheme remains stable, while both DO and DLRA with standard integrator diverge. Similarly, PS remains stable and converges to the optimal error as it does not require inverting Σ.

    In figure 2c, we plot the l2 norm of the inverse matrix versus r for all four methods used: Σ1 for DLRA, C1 for DO, Q(p,:)1 for TDB-CUR (DEIM) and Q(p,:) for TDB-CUR (OS-DEIM). As r increases, the matrices Σ1 and C1 become ill-conditioned, hence the condition numbers for DLRA and DO become unbounded. On the other hand, the condition numbers for TDB-CUR (DEIM) and TDB-CUR (OS-DEIM) remain nearly constant since the matrix Q(p,:) is well-conditioned. We also observe that the condition number for TDB-CUR (DEIM) can be improved by oversampling as seen in the plot for TDB-CUR (OS-DEIM).

    In figure 2d, we compare E(Tf) versus step size Δt for various reduction orders r=8,16,32. We observe that TDB-CUR (OS-DEIM) saturates to the optimal low-rank error for each r much quicker than using the unconventional robust integrator [23]. Furthermore, the TDB-CUR method retains the fourth-order accuracy of the Runge–Kutta scheme, whereas the order of accuracy for the unconventional robust integrator is first-order, despite using fourth-order Runge–Kutta for each substep of the algorithm. This confirms the first-order temporal accuracy of the unconventional integrator [23, section 3.1].

    (b) Stochastic Burgers equation

    For the second test case, we consider the one-dimensional Burgers equation subject to random initial and boundary conditions as follows:

    vt+12v2x=ν2vx2,x[0,1],t[0,5],v(x,0;ξ)=sin(2πx)[0.5(ecos(2πx)1.5)+σi=1dλxiψi(x)ξi],x[0,1],ξiN(μ,σ2)andv(0,t;ξ)=sin(2πt)+σi=1dλtiφi(t)ξi,x=0,ξiN(μ,σ2),
    where ν=2.5×103. The stochastic boundary at x=0 is specified above and the boundary at x=1 is v(x=1,t;ξ)=0. We use weak treatment of the boundary conditions for both the FOM and TDB [36]. The random space is taken to be d=17 dimensional and ξi’s are sampled from a normal distribution with mean μ=0, standard deviation σ=0.001 and s=256. In the stochastic boundary specification, we take φi(t)=sin(iπt) and λti=i2. In the stochastic initial condition, λxi and ψi(x) are the eigenvalues and eigenvectors of the spatial squared-exponential kernel, respectively. The fourth-order explicit Runge–Kutta method is used for time integration of the FOM and TDB-CUR with Δt=2.5×104. For discretization of the spatial domain, we use a second-order finite-difference scheme on a uniform grid with n=401. This leads to the following MDE of the form dV/dt=F(t,V):
    dV(t)dt=12D1(V(t)V(t))+νD2V(t)+B(t),V(0)=V0,
    where is the component-wise product and D1 and D2 are n×n sparse matrices defining the first and second spatial derivatives of the discretized system. The first and last row of D1 and D2 are equal to zero. The n×s matrix B(t) enforces the stochastic boundary at x=0 by setting each element in its first row equal to dv(0,t;ξ)/dt, for s independent samples of the random variables ξ. All other entries of B(t) are equal to zero. The columns of V0 are the initial conditions for s samples of the random variables.

    We first solve the system using TDB-CUR with fixed rank and compare the results against the DLRA and DO by solving equations (2.9a)–(2.9c) and (2.11a)–(2.11b), respectively. The fourth-order explicit Runge–Kutta method (a standard integrator) is used to solve both the DLRA and DO equations. For TDB-CUR, the rows are oversampled with m=5. No sparse sampling strategy is used for DLRA or DO, and equations (2.9a)–(2.9c) and (2.11a)–(2.11b) are solved as is. In figure 3a, we compare the error of TDB-CUR, DLRA and DO for different values of r. For r=6, TDB-CUR has a larger error compared to both DLRA and DO. This result is expected since TDB-CUR has an additional source of error from the sparse sampling procedure. However, as the rank is increased to r=9, the conditioning of the DLRA and DO with standard integrator starts to deteriorate and the error of TDB-CUR is actually lower than DLRA and DO. In fact, for r>9, DLRA and DO with standard integrators are unstable and cannot be integrated beyond the first time step. On the other hand, TDB-CUR remains stable, and the error decays as the rank is increased to a maximum value of r=18. While it is reasonable to expect that the error can be reduced further by increasing the rank to values of r>18, it is important to note that the rank of the initial condition is exactly r=18. Therefore, in order to increase the rank of the system beyond r=18 in a principled manner, we employ the rank-adaptive strategy from §2i. To this end, we initialize the system with rank r0=18, and use an upper threshold of ϵu=108 for mode addition. As observed in figure 3b, the rank is increased in time to a maximum of 23, leading to a further reduction in the error.

    Figure 3.

    Figure 3. Stochastic Burgers equation with constant diffusion: (a) Relative error evolution for fixed and adaptive rank. (b) Rank evolution using upper threshold ϵu=108 for mode addition.

    The mean solution is shown in figure 4 (right) along with the first 10 QDEIM sampling points. We observe that the sampling points are concentrated near the stochastic boundary at x=0 and also at points in the domain where shocks develop. Figure 4 (left) shows the evolution of the first two spatial modes, u1 (top) and u2 (bottom), where we observe excellent agreement between the FOM and TDB-CUR. It is important to note that these modes are energetically ranked according to the first and second singular values shown in figure 5a. Therefore, we observe that u1 captures the large scale energy containing structure, while u2 captures the small scale structure that is highly localized in space.

    Figure 4.

    Figure 4. Stochastic Burgers equation with constant diffusion: Left: Evolution of first two spatial modes, u1 and u2, and QDEIM points. Excellent agreement between the FOM and TDB-CUR is observed. Right: Mean solution with the first 10 QDEIM points (black dots).

    Figure 5.

    Figure 5. Stochastic Burgers equation with constant diffusion: (a) First 18 singular values of FOM versus TDB-CUR. (b) CPU time for scaling n=s for FOM, TDB-CUR and the unconventional integrator with fixed r=6.

    In figure 5a, we show that TDB-CUR accurately captures the leading singular values of the FOM solution, despite the large gap between the first and last resolved singular values. Finally, figure 5b compares the CPU time of the FOM, DLRA with the unconventional integrator, and TDB-CUR as the number of rows and columns of the matrix are increased simultaneously. We take n=s and observe that the FOM scales quadratically (O(ns)) while TDB-CUR and DLRA with the unconventional integrator scale linearly (O(n+s)). As the matrix size is increased, the disparity in CPU time becomes even more apparent, making the case for solving massive MDEs using low-rank approximation. Despite this result, it is important to note that linear scaling for the unconventional integrator is only possible since the nonlinear term in the Burgers equation is limited to quadratic. For higher-order polynomial and general nonlinearities, the unconventional integrator will scale with O(ns), and exceed the cost of solving the FOM. Nevertheless, given the factored rank-r approximation, V^=UΣYT, the quadratic term can be computed efficiently, resulting in a factorization that has a maximum rank of (r2+r)/2.

    To demonstrate the true power of the TDB-CUR method, we modify the right-hand side of the MDE by making the diffusion term nonlinear, ν(1+tanh(V)ξ)(D2V). To clarify, tanh is evaluated element-wise on its argument, and ξ is an s×s diagonal matrix with elements drawn from N(μ=0,σ2=0.012). Furthermore, we verify that ξ does not result in a negative diffusion. As a result of this simple modification, the cost of directly computing F(t,V^) will scale with O(ns), even for V^ of low-rank. Therefore, efficient computation of DLRA with a standard integrator [2], unconventional integrator [54] or projection method [25] is not possible. To highlight this, we compare the error versus cost (time to solution) for TDB-CUR, DLRA with the unconventional integrator, and DLRA using projection methods. We use the projected fourth-order Runge–Kutta method (PRK4) presented in [25] along with fourth-order Runge–Kutta for both TDB-CUR and the substeps of the unconventional integrator. For the PRK4 method, the unfactored n×s matrix, Fi, is computed at the ith stage of the integration scheme. The n×s matrix is then projected to the tangent space of the rank-r manifold at the ith stage as PTV^i(Fi), resulting in a matrix whose rank is at most 2r. Note that using the subscript to denote the stages results in V^1=V^k1. To limit rank growth during the internal steps of the RK4 method, the economy size SVD is applied after each substep to obtain the rank-r V^i=UiΣiYiT. This is only necessary for i>1, since V^1=V^k1 (see above). The orthonormal column and row bases, Ui and Yi, are then used for the tangent space projections at each stage. One final economy size SVD is applied so that the updated low-rank matrix, V^k, remains on the rank-r manifold. Although the nonlinear diffusion requires forming Fi of size n×s, our implementation does not require computing the SVD of matrices larger than n×8r. To our knowledge, this represents an efficient implementation of PRK4 when forming the full Fi cannot be avoided.

    Figure 6a shows the error versus cost for r=6,9,12,15,18. For TDB-CUR, we observe a rapid decrease in error for a modest increase in cost. Similar behaviour is observed for PRK4, however, both the error and cost exceed those of TDB-CUR. Finally, the unconventional integrator exhibits a larger error than both TDB-CUR and projection for a given Δt. Due to the first-order accuracy of the unconventional integrator, the error does not monotonically decrease as the cost (rank) is increased. To verify this, we decrease Δt by an order of magnitude and rerun. Despite the error dropping by an order of magnitude, the same non-monotonic behaviour in the error is observed. This confirms the error in the unconventional integrator is still dominated by the temporal error and not the low-rank approximation error. Finally, we plot the error versus time for TDB-CUR and PRK4 in figure 6b. As the rank is increased, we observe a corresponding decrease in the error for both methods. However, TDB-CUR ultimately achieves lower error than PRK4 as the rank is increased. Recently, a parallel in-time integrator is introduced that can accelerate the computation of DLRA [55].

    Figure 6.

    Figure 6. Stochastic Burgers equation with nonlinear diffusion: (a) Error versus time. (b) Error at t=5 versus total cost in seconds. The cost and error data points are obtained by considering r=6,9,12,15,18.

    Although it is not entirely obvious why PRK4 has a larger error as the rank is increased, we propose one possible explanation based on the curvature of the manifold. To this end, it is well known that the curvature of the manifold is inversely proportional to the smallest singular value in the low-rank solution [22,41]. Therefore, as the rank is increased in the above example, the curvature of the manifold at the low-rank solution increases rapidly. As the curvature increases, the tangent space will no longer provide a good approximation for small deviations (e.g. O(Δt)) from the low-rank solution at that point. Since PRK4 takes non-infinitesimal time steps off the rank-r manifold, the subsequent tangent space projections may induce errors that can be large for points on the manifold with high-order curvature. Therefore, one possible explanation for the above result is that the tangent space projection incurs a larger error since its accuracy relies heavily on the curvature of the manifold at that point. On the other hand, the TDB-CUR method does not use tangent space projections, and does not suffer from the high-order curvature of the manifold. For more details, we refer the reader to Charous & Lermusiaux [40] for an excellent discussion on the error induced by the tangent space projection.

    While figures 36 demonstrate the accuracy, efficiency, rank-adaptivity and favourable numerical performance of the TDB-CUR method, they do not convey the minimally intrusive nature of its implementation. To give a better perspective on the implementation efforts, the MATLAB code for solving the stochastic Burgers equation using the TDB-CUR method is included in the electronic supplementary material. While the code contains lines specific to the TDB-CUR method, after reviewing the entire code, it will become apparent that many of the included lines are already required for solving the FOM Burgers equation. Furthermore, there is no term-by-term implementation required to preserve efficiency and the FOM implementation of the Burgers equation is used to compute the sparse row and column samples. Therefore, given an existing FOM implementation, the code required to implement the TDB-CUR method is minimal. The code blocks required for implementing the method are labelled with %% TDB-CUR.

    (c) Stochastic advection–diffusion-reaction equation

    In this section, we aim to solve the two-dimensional ADR equation subject to random diffusion coefficient (α), with deterministic initial condition:

    vt+(u)v=(vα)+v210+v,x1[0,10],x2[0,2],t[0,5]
    v(x1,x2,0)=12(tanh(x2+0.50.1)tanh(x20.50.1)),
    where v(x1,x2,t) is the species concentration and u(x1,x2,t) is the velocity vector. It is worth noting that the nonlinearity of the equation is non-polynomial, implying that the computational expense of DO or DLRA is comparable to that of the FOM. The schematic of the problem is shown in figure 7. The velocity field is obtained by solving the incompressible Navier–Stokes equations and is independent of the species transport equation. The conditions are identical to those used in previous studies [6,7]. In particular, we solved the velocity field in the entire domain using the spectral/hp element method on an unstructured mesh with 4008 quadrilateral elements and polynomial order 5. For more details on the spectral element method see for example [56,57]. At the inlet, a parabolic velocity is prescribed, with an average velocity of u¯. The outflow condition is imposed at the right boundary and the no-slip boundary condition is imposed at the remaining boundaries. The Reynolds number with reference length H/2, and kinematic viscosity ν, is given by Re=u¯H/2ν=1000,
    Figure 7.

    Figure 7. Schematic of the flow visualized with a passive scalar.

    We solved the ADR and TDB-CUR equations using a collocated spectral element method within the rectangular domain indicated by dashed lines in figure 7. In particular, we use a uniform quadrilateral mesh with 50 elements in the x1 direction and 15 elements in the x2 direction, and a spectral polynomial of order 5 in each direction within the rectangular domain. This results in n=19076 degrees of freedom in the spatial domain. We interpolated the velocity field from the unstructured mesh onto the structured mesh. The fourth-order explicit Runge–Kutta method is used for time integration with Δt=5×104 for advancing the ADR and TDB-CUR equations.

    Unlike the previous example, we use a deterministic initial condition. Therefore, the rank at t=0 is exactly equal to one, i.e. rank(V(0))=1. While the low-rank approximation with r=1 will be exact in its initial condition, the rank of the system will quickly increase due to the nonlinearity. Therefore, to maintain an acceptable level of error, the rank of the approximation must increase in time. While it is possible to initialize TDB-CUR with r>1, we opt to use the rank-adaptive strategy from algorithm 1, starting with the initial rank of r0=1. Similarly, DLRA using the unconventional integrator can also be initialized with r>1, however, several rank-adaptive integrators have been proposed [5052]. On the other hand, initializing DLRA or DO for a standard integrator with r>1 is not possible, as Σ and C will be singular. Therefore, this problem setup emphasizes the need for rank-adaptivity for TDB-based low-rank matrix approximation.

    For the first case, we consider a random diffusion coefficient according to α=1/ξ, where ξ is a Gaussian random variable with a mean of μ=100 and standard deviation of σ=25. Since validating the performance of TDB-CUR requires solving the FOM, we do not consider a large number of samples for the first case. We draw s=50 samples of the diffusion coefficient, which allows us to compute the error and compare the singular values with the FOM in a reasonable amount of time. Figure 8 shows the evolution of the first three spatial modes, along with the sparse sampling points. As the simulation evolves in time, the points also evolve as the flow is advected from left to right. The instantaneous singular values from TDB-CUR and the r largest singular values of the FOM solution (SVD singular values) are shown in figure 9a. The discrepancy between the trailing singular values of the TDB-CUR and FOM stems from the effect of unresolved modes in the time integration of the low-rank approximation. However, as observed in the error in figure 9b, accurately resolving the leading singular values results in very small errors. Additionally, the error is controlled by lowering the error threshold for rank addition, leading to improved accuracy of the TDB-CUR approximation.

    Figure 8.

    Figure 8. Stochastic advection–diffusion-reaction equation: First three spatial modes of TDB-CUR at different time-steps and the selected points of DEIM with m=5 oversampled (OS) points. (s=50).

    Figure 9.

    Figure 9. Stochastic advection–diffusion-reaction equation: (a) Comparison of first 10 singular values of FOM versus TDB-CUR with 5 oversampled points (s=50). (b) Error (E) of TDB-CUR versus time. The result is presented for different values of the upper error bound (ϵu) for mode addition.

    In the second case, we take s=100000 samples of the random diffusion coefficient. This case is of particular interest, as it demonstrates the true potential of the TDB-CUR in cases where the FOM is too costly to run. In order to execute this scenario with the FOM, not only would we require sufficient memory to store the solution matrix of size n×s, we would also have to compute the nonlinear map of this massive matrix at each time step. On the other hand, with the new methodology, we never require storing a matrix larger than n×r or s×(m+r). To demonstrate this capability, we solve the TDB-CUR with s=100000 on a laptop computer. Since the available computational resources (our laptop) did not have sufficient memory to store the FOM solution matrix, we could not solve the FOM for comparison. Instead, we performed a convergence study by decreasing the threshold for rank addition, ϵu. By decreasing ϵu, we observe two things: (i) the rank is increased more rapidly and (ii) the maximum rank is increased. Figure 10 depicts the singular values of the TDB-CUR method versus time (a), and the rank at each time step for different values of ϵu (b). As ϵu is decreased, the rank is increased, and we observe convergence in the leading singular values.

    Figure 10.

    Figure 10. Stochastic advection–diffusion-reaction equation: (a) Singular values of TDB-CUR with 100 000 samples and 5 oversampled points. (b) Rank versus time. The result is presented for different values of the upper error bound (ϵu) for mode addition.

    4. Conclusion

    The objective of this work was to develop a method to solve nonlinear MDEs that is accurate, well-conditioned, computationally efficient and minimally intrusive. To this end, we presented the TDB-CUR algorithm for solving MDEs via low-rank approximation. The algorithm is based on a time-discrete variational principle that leverages sparse sampling to efficiently compute a low-rank matrix approximation at each iteration of the time-stepping scheme. Numerical experiments illustrate that the TDB-CUR algorithm provides a near-optimal low-rank approximation to the solution of MDEs, while significantly reducing the computational cost. Moreover, we showed the method is robust in the presence of small singular values, and significantly outperforms DLRA based on the time continuous variational principle, unconventional integrator and PRK4. Although not investigated in the present work, the TDB-CUR algorithm is also highly parallelizable, making it an attractive option for high-performance computing tasks.

    While the presented approach is minimally intrusive and can be applied to systems containing general nonlinearities, the goal of future work should be to make this method fully non-intrusive, allowing the FOM to be leveraged as a black box. This will allow the method to be applied to proprietary solvers while reducing the overall implementation efforts, making this powerful methodology more accessible to researchers and practitioners alike.

    Data accessibility

    Code is available as electronic supplementary material [58].

    Declaration of AI use

    We have not used AI-assisted technologies in creating this article.

    Authors' contributions

    M.D.: conceptualization, data curation, formal analysis, investigation, methodology, validation, visualization, writing—original draft, writing—review and editing; G.P.: conceptualization, data curation, formal analysis, investigation, methodology, software, validation, visualization, writing—original draft, writing—review and editing; H.N.: conceptualization, data curation, formal analysis, investigation, methodology, software, validation, visualization, writing—original draft, writing—review and editing; D.C.D.R.F.: conceptualization, formal analysis, investigation, writing—review and editing; H.B.: conceptualization, data curation, formal analysis, funding acquisition, investigation, methodology, project administration, resources, software, supervision, validation, visualization, writing—original draft, writing—review and editing.

    All authors gave final approval for publication and agreed to be held accountable for the work performed therein.

    Conflict of interest declaration

    We declare we have no competing interests.

    Funding

    This work is supported by the Air Force Office of Scientific Research award no. FA9550-21-1-0247 and funding from Transformational Tools and Technology (TTT), NASA grant no. 80NSSC22M0282. Computational resources are provided by the Center for Research Computing (CRC) at the University of Pittsburgh.

    Acknowledgements

    The authors thank Dr Gianluca Ceruti for numerous insightful and stimulating discussions that led to a number of improvements.

    Footnotes

    Electronic supplementary material is available online at https://doi.org/10.6084/m9.figshare.c.6858172.

    Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.