Oblique projection for scalable rank-adaptive reduced-order modelling of nonlinear stochastic partial differential equations with time-dependent bases
Abstract
Time-dependent basis reduced-order models (TDB ROMs) have successfully been used for approximating the solution to nonlinear stochastic partial differential equations (PDEs). For many practical problems of interest, discretizing these PDEs results in massive matrix differential equations (MDEs) that are too expensive to solve using conventional methods. While TDB ROMs have the potential to significantly reduce this computational burden, they still suffer from the following challenges: (i) inefficient for general nonlinearities, (ii) intrusive implementation, (iii) ill-conditioned in the presence of small singular values and (iv) error accumulation due to fixed rank. To this end, we present a scalable method for solving TDB ROMs that is computationally efficient, minimally intrusive, robust in the presence of small singular values, rank-adaptive and highly parallelizable. These favourable properties are achieved via oblique projections that require evaluating the MDE at a small number of rows and columns. The columns and rows are selected using the discrete empirical interpolation method (DEIM), which yields near-optimal matrix low-rank approximations. We show that the proposed algorithm is equivalent to a CUR matrix decomposition. Numerical results demonstrate the accuracy, efficiency and robustness of the new method for a diverse set of problems.
1. Introduction
Discretizations of many time-dependent partial differential equations (PDEs) result in matrix differential equations (MDEs) in the form of , where is the solution matrix and is obtained by discretizing the PDE in all dimensions except time. One such example is the uncertainty propagation of random parameters into the PDEs, which requires solving the PDEs for a large number of random realizations [1,2]. Discretization of this problem can be formulated as an MDE, where the rows of the matrix are obtained by discretizing the PDE in the physical domain and the columns of the matrix are samples of the discretized equation for a particular choice of random parameters. For high-dimensional PDEs subject to high-dimensional random parameters, the resulting MDEs can be massive. For example, uncertainty quantification of a three-dimensional time-dependent fluid flow typically requires solving an MDE with grid points (rows) and random samples (columns). Therefore, the solution to these massive MDEs is cost prohibitive due to the floating point operations (flops), memory and storage requirements. The discretization of many other PDEs can also be cast as MDEs, for example, kinetics equations [3–5], linear sensitivity analyses [6] and species transport equations in turbulent combustion [7].
For many practical applications, is instantaneously low-rank. Therefore, low-rank approximations using time-dependent bases (TDBs) have the potential to significantly reduce the computational cost of solving massive MDEs. For these systems, a TDB-based low-rank approximation extracts low-rank structures via TDBs for the column and row spaces of . A reduced-order model (ROM) is then constructed by projecting the full-order model (FOM) onto the column and row TDBs. Low-rank approximation based on TDB was first introduced in the quantum chemistry field to solve the Schrödinger equation [8], and it is commonly known as the multiconfiguration time-dependent Hartree (MCTDH) method. The MCTDH methodology was later presented for generic MDEs in [2] and is referred to as dynamical low-rank approximation (DLRA).
Various TDB ROM schemes have also been developed to solve stochastic partial differential equations (SPDEs). Dynamically orthogonal (DO) decomposition [1], bi-orthogonal (BO) decomposition [9], dual dynamically orthogonal (DDO) decomposition [10] and dynamically bi-orthogonal decomposition (DBO) [11] are all TDB-based low-rank approximation techniques for solving stochastic PDEs (SPDEs). In all of these decompositions (DO, BO, DDO and DBO), an evolution equation for the mean field is developed, along with evolution equations for the TDB ROM of the mean-subtracted stochastic fields. Although these decompositions have different forms and constraints, they are all equivalent, i.e. they produce identical low-rank matrices [11,12], and their differences lie only in their numerical performance. TDB ROMs have also been used in other fields and applications including dynamical systems [13,14], combustion [7,15], linear sensitivity analysis [6], dynamical instabilities [16–18], deep learning [19] and singular vale decomposition (SVD) estimation for matrices that vary smoothly with a parameter [20].
Despite the potential of using TDB ROMs to significantly reduce the computational cost of solving massive MDEs, there are still a number of outstanding challenges for most practical problems of interest. We summarize three key challenges below:
(i) | Computational efficiency: For specific classes of equations (e.g. homogeneous linear and quadratic nonlinear), rank- TDB ROMs can be solved efficiently with operations that scale with and for linear MDEs or scale with and for quadratic MDEs. However, this computational efficiency is lost for general nonlinearities, requiring operations that scale with the size of the FOM, i.e. . | ||||
(ii) | Intrusiveness: Even in the special cases of homogeneous linear and quadratic nonlinear equations, efficient implementation of TDB ROM evolution equations is an intrusive process [21, Appendix B]. This involves replacing the low-rank approximation in the FOM, projecting the resulting equation onto the tangent manifold and obtaining low-rank matrices for each term on the right-hand side. The process requires significant effort to derive, implement and debug the code. This poses a major obstacle for most practitioners, creating a significant barrier to adopting the methodology. | ||||
(iii) | Ill-conditioning: The TDB ROM evolution equations become numerically unstable when the singular values of the low-rank approximation become very small. This is particularly problematic because it is often necessary to retain very small singular values in order to have an accurate approximation. Small singular values lead to ill-conditioned matrices that require inversion in all variations of TDB ROM evolution equations [1,2,9–11], resulting in restrictive time step limitations for numerical integration and error amplification. |
The three time-integration schemes presented in [22,23,25] and the pseudo-inverse methodology presented in [24] can retain cost for linear and quadratic MDEs. But achieving this speedup comes at the expense of a highly intrusive implementation. However, for generic nonlinear MDEs, an intrusive implementation is not possible, and the computational cost of solving the TDB ROMs using methods presented in [22–25] scales with , which is the same as the cost of solving the FOM. Recently, a sparse interpolation algorithm was presented for solving the TDB ROM evolution equations with a computational complexity that scales with for generic nonlinear SPDEs [21]. However, this methodology still lacks robustness when the singular values become small, as it requires the inversion of the matrix of singular values.
In this work, we present a methodology inspired by interpolation and hyper-reduction techniques developed to accelerate nonlinear ROMs and finite-element models in vector differential equations [26–31]. In particular, we present CUR factorizations of low-rank matrices that address the above challenges, i.e. (i) the computational cost of the methodology scales with for generic nonlinear SPDEs both in terms of flops and memory costs, (ii) it lends itself to simple implementation in existing codes and (iii) the time-integration is robust in the presence of small singular values, and high-order explicit time integration can be used. To this end, the main elements of the presented methodology are (i) a time-discrete variational principle for minimization of the residual due to low-rank approximation error and (ii) a CUR factorization based on strategic row and column sampling of the time-discrete MDE.
The remainder of the paper is organized as follows: In §2, we first review the time-continuous variational principle and its associated challenges. We then present the time-discrete variational principle along with the rank-adaptive sparse sampling strategy for solving TDB ROMs. Finally, we show that the resulting low-rank approximation is equivalent to a CUR factorization and we provide an upper bound on the approximation error. In §3, we demonstrate the method for a toy problem as well as the stochastic Burgers equation and stochastic nonlinear advection–diffusion–reaction (ADR) equation. In §4, we summarize the present work and discuss itsimplications.
2. Methodology
(a) Setup
Consider the nonlinear stochastic PDE given by
The presented methodology is limited to explicit time integration schemes. For the computational complexity analysis, we consider sparse discretization schemes for spatial discretization, which means that each row is dependent on rows, where . The majority of discretization schemes, e.g. finite difference, finite volume, finite element, spectral element, result in sparse row dependence. As a result, the computational cost of computing each column of FOM (equation (2.2)) is and the cost of solving MDE (2.2) for all columns scales with . We also note that the presented methodology is not limited to sparse spatial discretizations and can be applied to dense discretizations as well. See remark 2.10 for more details.
(b) Preliminaries
In this section, we present some of the definitions of matrix manifolds, tangent spaces, orthogonal and oblique projections and CUR decomposition.
Definition 2.1. (Low-rank matrix manifolds)
The low-rank matrix manifold is defined as the set
Any member of may be represented by , where and are a set of orthonormal columns and is a rank- matrix. The rank- matrix may also be represented via the multiplication of two matrices, i.e. , where and have full column rank.
Definition 2.2. (Tangent space)
The tangent space of manifold at , represented with the decomposition of , is the set of matrices in the form of Koch & Lubich [2]:
Definition 2.3. (Orthogonal projection onto the tangent space)
The orthogonal projection of matrix onto the tangent space of manifold at , represented with the decomposition of , is given by Koch & Lubich [2, lemma 4.1]:
In the above projection, and are orthogonal projections onto spaces spanned by the columns of and . We denote these orthogonal projections with:
Definition 2.4. (Oblique projection)
Let and be orthonormal matrices and let and be sets of distinct row and column indices, respectively. Oblique projectors onto Ran() and Ran() are defined as [37]
In the above definition, the symbol distinguishes these projectors from orthogonal projectors. For a given matrix , operates on the left side of the matrix and operates on the right side of the matrix. It is easy to verify that and . It is also easy to verify that the oblique projection of belongs to the manifold of rank- matrices, i.e. .
For the special case of , the oblique projectors and are also interpolatory projectors. In this case, the oblique projectors become and . Unlike orthogonal projection or a general oblique projection, the interpolatory projection is guaranteed to match the original matrix at the selected rows and columns, i.e.
Definition 2.5. (CUR decomposition)
A CUR decomposition of matrix is a rank- approximation of in the form of , where and are actual columns and rows of matrix , i.e. and . The matrix is computed such that is a good approximation to . The CUR of matrix is denoted with .
Here, the matrices , and are different from the matrices defined in previous sections. Different CUR decompositions can be obtained for the same matrix depending on two factors: (i) the selection of columns and rows and (ii) the method used to compute the matrix . For more details on CUR decompositions, we refer the reader to Mahoney & Drineas [38]. Finally, it is easy to verify that . The connection between CUR decomposition and oblique projections is shown in §2g.
(c) Time-continuous variational principle
The central idea behind TDB-based low-rank approximation is that the bases evolve optimally to minimize the residual due to low-rank approximation error. The residual is obtained by substituting an SVD-like low-rank approximation into the FOM so that is closely approximated by the rank- matrix
Because this is a low-rank approximation, it cannot satisfy the FOM exactly and there will be a residual equal to:
As was shown in [39], it is possible to derive a similar variational principle for the DO decomposition, , whose optimality conditions are constrained to the orthonormality of the spatial modes, , via the DO condition, . However, for the sake of simplicity and unlike the original DO formulation presented in [1], an evolution equation for the mean field is not derived. Without loss of generality, the low-rank DO evolution equations become
Despite the potential of equations (2.9a)–(2.9c) to significantly reduce the computational cost of solving massive MDEs like equation (2.2), there are still a number of outstanding challenges for most practical problems of interest. As highlighted in the Introduction, computing requires operations that scale with the size of the FOM. This involves applying the nonlinear map () on every column of the matrix . While it is possible to achieve for the special cases of homogeneous linear and quadratic nonlinear , this comes at the expense of a highly intrusive process, that requires a careful term-by-term treatment of the right side of equations (2.9a)–(2.9c) [21, Appendix B]. Furthermore, solving equations (2.9b) and (2.9c) becomes unstable when is singular or near singular. This is particularly problematic because it is often necessary to retain very small singular values in order to have an accurate approximation.
While the low-rank approximation based on TDBs can be cast in different, yet equivalent formulations, we have chosen equations (2.9a)–(2.9c) over DO/BO/DDO decompositions to highlight the underlying challenges. Since DO/BO/DDO decompositions exhibit all of the above challenges, addressing these challenges in the context of equations (2.9a)–(2.9c) automatically addresses the DO/BO/DDO challenges as well.
(d) Time-discrete variational principle
To address the challenges of low-rank approximations based on TDB using the time-continuous variational principle, we consider a time-discrete variational principle for rank-adaptive matrix approximations, which has recently been applied in [40] and also [25,41] in the context of tensors. To this end, consider an explicit Runge–Kutta temporal discretization of equation (2.2):
The time-discrete variational principle can be stated as finding the best such that the Frobenius norm of the residual is minimized [25]:
An important advantage of equation (2.15) over equations (2.9a)–(2.9c) is that the time advancement according to equation (2.15) does not become singular in the presence of small singular values. While this solves the issue of ill-conditioning, computing equation (2.15) at each iteration of the time stepping scheme is cost prohibitive. This computational cost is due to two sources: (i) computing the nonlinear map to obtain and (ii) computing . The cost of (i) alone makes the solution of the time-discrete variational principle as expensive as the FOM, i.e. . Besides the flops cost associated with computing , the memory cost of storing is prohibitive for most realistic applications. On the other hand, computing the exact SVD of scales with . While this cost is potentially alleviated by fast algorithms for approximating the SVD, e.g. randomized SVD [42] or incremental QR [37], for general nonlinearities in , (i) is unavoidable. This ultimately leads to a computational cost that exceeds that of the FOM.
(e) Low-rank approximation via an oblique projection
To overcome these challenges, we present an oblique projection scheme that enables a cost-effective approximation to the rank- . Before presenting our methodology, we provide a geometric interpretation of . In particular, can be interpreted as an orthogonal projection onto the column and row space of as shown below:
In the following, we present a methodology that computes an accurate approximation to in a cost-effective manner. From the geometric perspective, our approach is to use an oblique projection onto a set of rank- orthonormal column () and row () subspaces:

Figure 1. Schematic of the TDB-CUR methodology. (a) A geometric representation of the methodology depicting the departure from the rank- manifold at to (red). Two possible mappings that truncate back to the rank- manifold are shown: (green) computed via orthogonal projection (SVD) and (purple) computed via oblique projection (TDB-CUR). The error of (dashed green line) is orthogonal to the tangent space at , while the error of (dashed purple line) is not. (b) The low-rank approximation, , can be computed such that the residual at the selected rows (red) and columns (blue) is equal to zero. This is accomplished via sparse interpolation of the selected rows and columns. (c) Using interpolatory projection, the resulting is an approximation to the rank- truncated , and is equivalent to the low-rank CUR factorization that interpolates the selected rows and columns of . Although it is equivalent to this CUR factorization, the numerical computation of is different, as it does not require inverting .
From the matrix decomposition point of view, the above approximation may be represented via a CUR decomposition:
From the residual minimization perspective, the SVD can be viewed as a Galerkin projection where is minimized. On the other hand, the presented approach based on interpolatory projection (a special case of oblique projection) can be viewed as a collocated scheme where the residual is set to zero at strategically selected rows and columns of the residual matrix, , in equation (2.13). To this end, we present an algorithm to set and , where and are vectors containing the row and column indices at which the residual is set to zero. This simply requires and . See figure 1b. In §2h, we consider oblique projection for the general case when (definition 2.4), where the residual at the selected rows and columns is not guaranteed to be zero.
Although the approach we will present is equivalent to the oblique projection of equation (2.17), and are the unknown column and row bases of at the current time step. Therefore, equation (2.17) cannot be readily used for the computation of . In the following, we present a methodology to compute these bases (and subsequently ) by strategically sampling columns and rows of . While there are many possible choices for the indices and , selecting these points should be done in a principled manner, to ensure the residual at all points remains small. To compute these points, we use the discrete empirical interpolation method (DEIM) [43] which has been shown to provide near-optimal sampling points for computing CUR matrix decompositions [37]. A similar approach was recently applied in [21] to accelerate the computation of equations (2.9a)–(2.9c), by only sampling at a small number of rows and columns. However, the approach presented in [21] still suffers from the issue of ill-conditioning.
To compute the DEIM points, the rank- SVD (or an approximation) is required [37]. Since we do not have access to the rank- SVD at the current time step, , we use the approximation of the SVD from the previous time step, , to compute the DEIM points. The initial approximation is ideally obtained from the FOM initial condition as the rank- . The algorithm for computing using interpolation is as follows:
(i) | Compute the sampling indices, , and , in parallel. | ||||
(ii) | Compute and by taking one step according to equation (2.12) at the selected rows and columns, in parallel. | ||||
(iii) | Compute as the orthonormal basis for the range of by QR decomposition such that , where . | ||||
(iv) | Interpolate every column of onto the orthonormal basis at sparse indices : 2.19
where is the matrix of interpolation coefficients such that interpolates onto the basis at the interpolation points indexed by . | ||||
(v) | Compute the SVD of so that 2.20
where , and . | ||||
(vi) | Compute as the in-subspace rotation: 2.21
|
(i) | Computational efficiency: The above procedure returns the updated low-rank approximation , and only requires sampling at rows and columns. This alone significantly reduces both the required number of flops and memory, compared to computing the entire . Furthermore, instead of directly computing the SVD of the matrix , we only require computing the QR of the matrix , and the SVD of the matrix . This reduces the computational cost to for and . Moreover, in most practical applications, computing is the costliest part of the algorithm, which requires solving samples of the FOM. However, since these samples are independent of each other, the columns of can be computed in parallel. Similarly, each row of can be computed in parallel. | ||||
(ii) | Intrusiveness: While this significantly reduces the computational burden, perhaps an equally important outcome is the minimally intrusive nature of the above approach. For example, when the columns of are independent, e.g. random samples, can be computed by directly applying equation (2.12) to the selected columns of the low-rank approximation from the previous time step. This effectively allows for existing numerical implementations of equation (2.12) to be used as a black box for computing . The non-intrusive column sampling in the presented algorithm is the counterpart of solving equation (2.9b) in DLRA and equation (2.11a) in DO. However, equations (2.9b) and (2.11a) require deriving and implementing new PDEs, whereas the presented algorithm allows an existing deterministic solver to be used in a black box fashion, in which a suitable column space basis is extracted. On the other hand, the rows of are in general dependent, based on a known map for the chosen spatial discretization scheme, e.g. sparse discretizations like finite difference, finite element, or dense discretization schemes, e.g. global spectral methods. Therefore, computing does require specific knowledge of the governing equations, namely the discretized differential operators. Based on the discretization scheme, one can determine a set of adjacent points, , that are required for computing the derivatives at the points specified by . While this introduces an added layer of complexity, this is much less intrusive than deriving and implementing reduced-order operators for each term in the governing equations; which we emphasize again, is only feasible for homogeneous linear or quadratic nonlinear equations. In the present work, that bottleneck is removed, regardless of the type of nonlinearity. | ||||
(iii) | Ill-conditioning: The presented algorithm is robust in the presence of small or zero singular values. First note that the inversion of the matrix of singular values is not required in the presented algorithm. In fact, the conditioning of the algorithm depends on and , and the DEIM algorithm ensures that these two matrices are well-conditioned. To illustrate this point, let us consider the case of overapproximation where the rank of is . In this case, equations (2.9a)–(2.9c) and equations (2.11a)–(2.11b) cannot be advanced because and will be singular, i.e. rankrank. On the other hand, despite being rank-deficient, will still be a full rank matrix in the presented algorithm. While there is no guarantee that a subset of rows of , i.e. is well-conditioned, the DEIM is a greedy algorithm that is designed to keep as small as possible in a near-optimal fashion. In §2h, we show that oversampling further improves the condition number of the presented algorithm, and in theorem 2.8, we show that plays an equally important role in maintaining a well-conditioned algorithm. |
(f) Computing
Up until this point, we have considered to be an matrix resulting from an explicit Runge–Kutta temporal discretization of equation (2.12). We showed that sparse row and column measurements, and , could be used to efficiently compute an approximation to the rank- SVD of . While is straightforward to compute for independent random samples, as discussed in §2e, computing depends on a set of adjacent points, , according to the spatial discretization scheme. As a result, for higher-order integration schemes, special care must be taken in the computation of . To demonstrate this, we consider the second-order explicit Runge–Kutta scheme where
(i) | Compute . | ||||
(ii) | Compute the first stage at the rows as
Note, if the explicit Euler method is used, , and no additional steps are required. Simply compute . If a higher-order scheme is used, proceed with the following steps. | ||||
(iii) | The final stage of the second-order integration scheme requires taking a half step to evaluate at the midpoint:
Note that for the second-order scheme, . Here, we require , given by
Note that we now require to evaluate the above expression. While this can be computed according to Step (ii), where will have its own set of adjacent points , this process quickly gets out of hand, especially as more stages are added to the integration scheme. As a result, for higher-order schemes, the efficiency afforded by the presented algorithm will deteriorate, and the resulting implementation will become increasingly complex. To overcome these challenges, we instead compute the low-rank approximation , using the sparse row and column measurements, and , which are already required for computing and . Here, the subscript denotes the stage of the integration scheme. The first step is to compute as an orthonormal basis for the , using QR. Next, compute the oblique projection of onto , such that . Using this low-rank approximation, is readily approximated by . Although we have considered the second-order Runge–Kutta method in the example above, this approach is easily extended to higher-order Runge–Kutta methods. It is straightforward to show that the above procedure is equivalent to a CUR decomposition of matrix , similar to our previous work [21]. |
(g) Equivalence to a CUR decomposition and oblique projection
Before presenting the details of our methodology in §2e, we discuss how the presented approach can be understood as an oblique projection (equation (2.17)) or alternatively as a CUR decomposition (equation (2.18)). In this section, we show that (i) the presented algorithm is equivalent to a CUR decomposition (theorem 2.6) and (ii) the matrix is obtained via an oblique projection, which requires access to only the selected rows and columns of (theorem 2.7). In theorems 2.6 and 2.7, and are matrices of size and , respectively. We drop the time step for simplicity.
Theorem 2.6.
Let be the low-rank approximation of computed according to the TDB-CUR algorithm. Then: (i) is equivalent to the CUR factorization given by . (ii) The low-rank approximation is exact at the selected rows and columns, i.e. and .
Proof.
(i) | According to the TDB-CUR algorithm, is a basis for the . Therefore, , and it follows that . Substituting this result into the CUR factorization gives
Rearranging the above expression gives the desired result
where we have used , from equation (2.19). | ||||
(ii) | Using the above result, , we show the selected rows of are exact, i.e. :
Similarly for the columns,
This completes the proof. |
Now we show that is an oblique projection of onto the selected columns and rows of . In particular, the oblique projector involved is an interpolatory projector. For the sake of brevity, we drop the superscript in the following.
Theorem 2.7.
Let be the low-rank approximation of computed according to the TDB-CUR algorithm. Then where and are oblique projectors onto Ran() and Ran(), respectively, according to equation (2.5).
Proof.
We first show that can be represented versus as the interpolation basis. To this end, replacing in the definition of results in:
In the following theorem, we show that the oblique projection error is bounded by an error factor multiplied by the maximum of orthogonal projection errors onto or . We follow a similar procedure to that used in [37], however, in [37] the CUR is computed based on orthogonal projections onto the selected columns and rows, whereas in the presented TDB-CUR algorithm, oblique projectors are used. Without loss of generality, we consider a generic oblique projection, where the indexing matrices and are of size and , respectively, and in general, (see definition 2.4 for details). In the following, we use the second norm ().
Theorem 2.8.
Let and be oblique projectors according to definition 2.4 and let and be a set of orthonormal matrices, i.e. and . Let be given by: , where and and . Then the error of the oblique projection is bounded by
Proof.
First note that because and . The error matrix can be written as
In the above error bound, when and are the most dominant exact left and right singular vectors of , then , where is the th singular value of , since
(h) Oversampling for improved condition number
The above error analysis shows that the CUR rank- approximation can be bounded by an error factor times the maximum error obtained from the orthogonal projection of onto or . This analysis reveals that better conditioned and matrices result in smaller and , which then results in smaller error factor . In the context of DEIM interpolation, it was shown that oversampling can improve the condition number of oblique projections [47]. The authors demonstrated that augmenting the original DEIM algorithm with an additional sampling points can reduce the value of , leading to smaller approximation errors. This procedure of sampling more rows than the number of basis vectors leads to an overdetermined system where an approximate solution can be found via a least-square regression rather than interpolation. Additionally, it was shown in [48] that for matrices with rapidly decaying singular values (as targeted in this work), oversampling improves the accuracy of CUR decompositions.
In the following, we extend the TDB-CUR algorithm for row oversampling. As a direct result of the oversampling procedure, the oblique projection of onto the range of the orthonormal basis becomes , where contains the row indices. Note that is the pseudoinverse of , however, we do not apply any singular value threshold cutoff to compute and exact inversion of is used. Therefore, the oblique projection becomes a least squares best-fit solution. Also, increasing the number of oversampling points decreases and it follows that for the maximum number of oversampling points, i.e. when all the rows are sampled, the orthogonal projection of every column of onto is recovered, where attains its smallest value, which is . Note that, unlike the interpolatory projector, . The oversampling is also applied analogously to the low-rank approximation of : where . The low-rank approximation of is presented in §2f.
We refer to the above sampling procedure as
(i) Rank-adaptivity
In order to control the error while avoiding unnecessary computations, the rank of the TDB must be able to adapt on the fly. The importance of rank-adaptivity for low-rank approximation with TDB has been recognized and several algorithms have been proposed recently. See for example [50–52]. We show that it is easy to incorporate mode adaptivity into the TDB-CUR algorithm. In the case of rank reduction, once the new rank is chosen, such that , the low-rank matrices are simply truncated to retain only the first components, i.e. , and . On the other hand, the rank can be increased, such that , by sampling more columns () than the number of basis vectors (), i.e. oversampling. Similar to the procedure used for oversampling the rows, the column indices are determined via the GappyPOD+E algorithm. While this provides a straightforward approach for how to adapt the rank, it does not address when the rank should be adapted, or what that new rank should be.
Informed by the error analysis from the preceding section, we devise a suitable criterion for controlling the error via rank addition and removal. Since it is not possible to know the true error without solving the expensive FOM, we devise a proxy for estimating the low-rank approximation error:
To make the error proxy more robust for problems of varying scale and magnitude, we divide by the Frobenius norm of , where it is well known that . Rather than set a hard threshold, we add/remove modes to maintain within a desired range, , where and are user-specified lower and upper bounds, respectively. If we increase the rank to , and if we decrease the rank to . As a result, this approach avoids the undesirable behaviour of repeated mode addition and removal, which is observed by setting a hard threshold. The rank-adaptive TDB-CUR algorithm is detailed in algorithm 1.
It is important to note that this is not the only criterion for mode addition and removal, and one can devise a number of strategies based on the problem at hand. However, from our numerical experiments, this approach has proved to be simple and effective, and it does a good job at capturing the trend of the true error. For more details on estimating rank and selection criteria, we refer the reader to [53, section 2.3]. Finally, it is possible to increase the rank by more than one in algorithm 1, if required. This can be determined by applying the singular value threshold check after executing Line 19. If is still true, one more column can be sampled. This requires executing Line 10 to find the new column index, Line 11 to update , evaluating only for the new column using Line 14, and following Lines 15–20. These iterations can be carried out many times, until falls below .
Remark 2.9.
Algorithm 1 is presented for solving MDEs that arise from discretizing PDEs with parametric uncertainties, where the rows are dependent on each other but columns can be solved independently. However, algorithm 1, with minor modification, can be applied to MDEs where the columns are also dependent on each other. In such cases, evaluating requires providing , where is the set of column indices, whose values are needed to compute .
Remark 2.10.
Algorithm 1 can be applied to problems with dense spatial discretizations, where . These MDEs can arise, for example, from global discretization methods such as spectral methods. In the most generic form, the computational complexity of computing each entry of the right-hand side matrix can be for some . The computational complexity of solving FOM is times the computational complexity of each entry, i.e. or . The presented algorithm reduces the cost of evaluating the FOM for this generic setting to .
For MDEs arising from the discretization of PDEs with parametric uncertainties, , and when sparse discretization schemes are used for the spatial discretization, . However, when global discretization methods are used, . Take for example, , where is a full matrix obtained from discretization of linear differential operators. For this problem, the cost of solving FOM scales with , while the cost of solving TDB-CUR scales with . The toy problem presented in §3a is a demonstration of a case where there is a dense coupling between both columns and rows.
3. Demonstrations
(a) Toy problem
As our first example, we compare the accuracy of the presented algorithm against DLRA using standard integrator [2], DO [1], the PS time integrator [22] and the recently proposed unconventional robust integrator [23]. We emphasize that it is already established that the standard integrator, for example, Runge–Kutta schemes, are unstable for solving equations (2.9a)–(2.9c), which has motivated the development of new time integration techniques [22,23]. We consider the time-dependent matrix from Ceruti & Lubich [23] given explicitly as
In figure 2a, we plot the error at the final time versus rank for two different step-sizes: and . We consider the following cases: TDB-CUR (DEIM), TDB-CUR (OS-DEIM) with , DO [1] (equations (2.11a)–(2.11b)), DLRA using standard integrator [2] (equations (2.9a)–(2.9c)), and PS [22]. For reference, we also show the optimal error that is obtained via the rank- SVD of the exact solution at the final time , denoted by Figure 2. Toy problem: (a,b) Error at the final time versus reduction order for step-sizes and (c) norm of the inverse matrix versus reduction order (d) versus step size for various reduction orders . (a) Full rank , (b) rank-deficient , (c) full rank and (d) full rank .
In figure 2b, we consider a rank-deficient matrix and overapproximation using different low-rank techniques. Specifically, we consider the matrix , where the diagonal entries are given by for , and all remaining entries are zero. Our results show that the error drops to the optimal temporal error when . This is because the rank of the matrix is . Furthermore, we observe that even in the case of rank deficiency, TDB-CUR remains stable for . This finding supports the observation made in §2e regarding the conditioning of the presented algorithm. Specifically, even when , the matrix is well-conditioned, and the TDB-CUR scheme remains stable, while both DO and DLRA with standard integrator diverge. Similarly, PS remains stable and converges to the optimal error as it does not require inverting .
In figure 2c, we plot the norm of the inverse matrix versus for all four methods used: for DLRA, for DO, for TDB-CUR (DEIM) and for TDB-CUR (OS-DEIM). As increases, the matrices and become ill-conditioned, hence the condition numbers for DLRA and DO become unbounded. On the other hand, the condition numbers for TDB-CUR (DEIM) and TDB-CUR (OS-DEIM) remain nearly constant since the matrix is well-conditioned. We also observe that the condition number for TDB-CUR (DEIM) can be improved by oversampling as seen in the plot for TDB-CUR (OS-DEIM).
In figure 2d, we compare versus step size for various reduction orders . We observe that TDB-CUR (OS-DEIM) saturates to the optimal low-rank error for each much quicker than using the unconventional robust integrator [23]. Furthermore, the TDB-CUR method retains the fourth-order accuracy of the Runge–Kutta scheme, whereas the order of accuracy for the unconventional robust integrator is first-order, despite using fourth-order Runge–Kutta for each substep of the algorithm. This confirms the first-order temporal accuracy of the unconventional integrator [23, section 3.1].
(b) Stochastic Burgers equation
For the second test case, we consider the one-dimensional Burgers equation subject to random initial and boundary conditions as follows:
We first solve the system using TDB-CUR with fixed rank and compare the results against the DLRA and DO by solving equations (2.9a)–(2.9c) and (2.11a)–(2.11b), respectively. The fourth-order explicit Runge–Kutta method (a standard integrator) is used to solve both the DLRA and DO equations. For TDB-CUR, the rows are oversampled with . No sparse sampling strategy is used for DLRA or DO, and equations (2.9a)–(2.9c) and (2.11a)–(2.11b) are solved as is. In figure 3a, we compare the error of TDB-CUR, DLRA and DO for different values of . For , TDB-CUR has a larger error compared to both DLRA and DO. This result is expected since TDB-CUR has an additional source of error from the sparse sampling procedure. However, as the rank is increased to , the conditioning of the DLRA and DO with standard integrator starts to deteriorate and the error of TDB-CUR is actually lower than DLRA and DO. In fact, for , DLRA and DO with standard integrators are unstable and cannot be integrated beyond the first time step. On the other hand, TDB-CUR remains stable, and the error decays as the rank is increased to a maximum value of . While it is reasonable to expect that the error can be reduced further by increasing the rank to values of , it is important to note that the rank of the initial condition is exactly . Therefore, in order to increase the rank of the system beyond in a principled manner, we employ the rank-adaptive strategy from §2i. To this end, we initialize the system with rank , and use an upper threshold of for mode addition. As observed in figure 3b, the rank is increased in time to a maximum of 23, leading to a further reduction in the error.
Figure 3. Stochastic Burgers equation with constant diffusion: (a) Relative error evolution for fixed and adaptive rank. (b) Rank evolution using upper threshold for mode addition.
The mean solution is shown in figure 4 (right) along with the first 10 QDEIM sampling points. We observe that the sampling points are concentrated near the stochastic boundary at and also at points in the domain where shocks develop. Figure 4 (left) shows the evolution of the first two spatial modes, (top) and (bottom), where we observe excellent agreement between the FOM and TDB-CUR. It is important to note that these modes are energetically ranked according to the first and second singular values shown in figure 5a. Therefore, we observe that captures the large scale energy containing structure, while captures the small scale structure that is highly localized in space.
Figure 4. Stochastic Burgers equation with constant diffusion: Left: Evolution of first two spatial modes, and , and QDEIM points. Excellent agreement between the FOM and TDB-CUR is observed. Right: Mean solution with the first 10 QDEIM points (black dots). Figure 5. Stochastic Burgers equation with constant diffusion: (a) First singular values of FOM versus TDB-CUR. (b) CPU time for scaling for FOM, TDB-CUR and the unconventional integrator with fixed .
In figure 5a, we show that TDB-CUR accurately captures the leading singular values of the FOM solution, despite the large gap between the first and last resolved singular values. Finally, figure 5b compares the CPU time of the FOM, DLRA with the unconventional integrator, and TDB-CUR as the number of rows and columns of the matrix are increased simultaneously. We take and observe that the FOM scales quadratically () while TDB-CUR and DLRA with the unconventional integrator scale linearly (). As the matrix size is increased, the disparity in CPU time becomes even more apparent, making the case for solving massive MDEs using low-rank approximation. Despite this result, it is important to note that linear scaling for the unconventional integrator is only possible since the nonlinear term in the Burgers equation is limited to quadratic. For higher-order polynomial and general nonlinearities, the unconventional integrator will scale with , and exceed the cost of solving the FOM. Nevertheless, given the factored rank- approximation, , the quadratic term can be computed efficiently, resulting in a factorization that has a maximum rank of .
To demonstrate the true power of the TDB-CUR method, we modify the right-hand side of the MDE by making the diffusion term nonlinear, . To clarify, is evaluated element-wise on its argument, and is an diagonal matrix with elements drawn from . Furthermore, we verify that does not result in a negative diffusion. As a result of this simple modification, the cost of directly computing will scale with , even for of low-rank. Therefore, efficient computation of DLRA with a standard integrator [2], unconventional integrator [54] or projection method [25] is not possible. To highlight this, we compare the error versus cost (time to solution) for TDB-CUR, DLRA with the unconventional integrator, and DLRA using projection methods. We use the projected fourth-order Runge–Kutta method (PRK4) presented in [25] along with fourth-order Runge–Kutta for both TDB-CUR and the substeps of the unconventional integrator. For the PRK4 method, the unfactored matrix, , is computed at the stage of the integration scheme. The matrix is then projected to the tangent space of the rank- manifold at the stage as , resulting in a matrix whose rank is at most . Note that using the subscript to denote the stages results in To limit rank growth during the internal steps of the RK4 method, the economy size SVD is applied after each substep to obtain the rank- . This is only necessary for , since (see above). The orthonormal column and row bases, and , are then used for the tangent space projections at each stage. One final economy size SVD is applied so that the updated low-rank matrix, , remains on the rank- manifold. Although the nonlinear diffusion requires forming of size , our implementation does not require computing the SVD of matrices larger than . To our knowledge, this represents an efficient implementation of PRK4 when forming the full cannot be avoided.
Figure 6a shows the error versus cost for . For TDB-CUR, we observe a rapid decrease in error for a modest increase in cost. Similar behaviour is observed for PRK4, however, both the error and cost exceed those of TDB-CUR. Finally, the unconventional integrator exhibits a larger error than both TDB-CUR and projection for a given . Due to the first-order accuracy of the unconventional integrator, the error does not monotonically decrease as the cost (rank) is increased. To verify this, we decrease by an order of magnitude and rerun. Despite the error dropping by an order of magnitude, the same non-monotonic behaviour in the error is observed. This confirms the error in the unconventional integrator is still dominated by the temporal error and not the low-rank approximation error. Finally, we plot the error versus time for TDB-CUR and PRK4 in figure 6b. As the rank is increased, we observe a corresponding decrease in the error for both methods. However, TDB-CUR ultimately achieves lower error than PRK4 as the rank is increased. Recently, a parallel in-time integrator is introduced that can accelerate the computation of DLRA [55].
Figure 6. Stochastic Burgers equation with nonlinear diffusion: (a) Error versus time. (b) Error at versus total cost in seconds. The cost and error data points are obtained by considering .
Although it is not entirely obvious why PRK4 has a larger error as the rank is increased, we propose one possible explanation based on the curvature of the manifold. To this end, it is well known that the curvature of the manifold is inversely proportional to the smallest singular value in the low-rank solution [22,41]. Therefore, as the rank is increased in the above example, the curvature of the manifold at the low-rank solution increases rapidly. As the curvature increases, the tangent space will no longer provide a good approximation for small deviations (e.g. ) from the low-rank solution at that point. Since PRK4 takes non-infinitesimal time steps off the rank- manifold, the subsequent tangent space projections may induce errors that can be large for points on the manifold with high-order curvature. Therefore, one possible explanation for the above result is that the tangent space projection incurs a larger error since its accuracy relies heavily on the curvature of the manifold at that point. On the other hand, the TDB-CUR method does not use tangent space projections, and does not suffer from the high-order curvature of the manifold. For more details, we refer the reader to Charous & Lermusiaux [40] for an excellent discussion on the error induced by the tangent space projection.
While figures 3–6 demonstrate the accuracy, efficiency, rank-adaptivity and favourable numerical performance of the TDB-CUR method, they do not convey the minimally intrusive nature of its implementation. To give a better perspective on the implementation efforts, the MATLAB code for solving the stochastic Burgers equation using the TDB-CUR method is included in the electronic supplementary material. While the code contains lines specific to the TDB-CUR method, after reviewing the entire code, it will become apparent that many of the included lines are already required for solving the FOM Burgers equation. Furthermore, there is no term-by-term implementation required to preserve efficiency and the FOM implementation of the Burgers equation is used to compute the sparse row and column samples. Therefore, given an existing FOM implementation, the code required to implement the TDB-CUR method is minimal. The code blocks required for implementing the method are labelled with
(c) Stochastic advection–diffusion-reaction equation
In this section, we aim to solve the two-dimensional ADR equation subject to random diffusion coefficient (), with deterministic initial condition:

Figure 7. Schematic of the flow visualized with a passive scalar.
We solved the ADR and TDB-CUR equations using a collocated spectral element method within the rectangular domain indicated by dashed lines in figure 7. In particular, we use a uniform quadrilateral mesh with 50 elements in the direction and 15 elements in the direction, and a spectral polynomial of order 5 in each direction within the rectangular domain. This results in degrees of freedom in the spatial domain. We interpolated the velocity field from the unstructured mesh onto the structured mesh. The fourth-order explicit Runge–Kutta method is used for time integration with for advancing the ADR and TDB-CUR equations.
Unlike the previous example, we use a deterministic initial condition. Therefore, the rank at is exactly equal to one, i.e. . While the low-rank approximation with will be exact in its initial condition, the rank of the system will quickly increase due to the nonlinearity. Therefore, to maintain an acceptable level of error, the rank of the approximation must increase in time. While it is possible to initialize TDB-CUR with , we opt to use the rank-adaptive strategy from algorithm 1, starting with the initial rank of . Similarly, DLRA using the unconventional integrator can also be initialized with , however, several rank-adaptive integrators have been proposed [50–52]. On the other hand, initializing DLRA or DO for a standard integrator with is not possible, as and will be singular. Therefore, this problem setup emphasizes the need for rank-adaptivity for TDB-based low-rank matrix approximation.
For the first case, we consider a random diffusion coefficient according to , where is a Gaussian random variable with a mean of and standard deviation of . Since validating the performance of TDB-CUR requires solving the FOM, we do not consider a large number of samples for the first case. We draw samples of the diffusion coefficient, which allows us to compute the error and compare the singular values with the FOM in a reasonable amount of time. Figure 8 shows the evolution of the first three spatial modes, along with the sparse sampling points. As the simulation evolves in time, the points also evolve as the flow is advected from left to right. The instantaneous singular values from TDB-CUR and the largest singular values of the FOM solution (SVD singular values) are shown in figure 9a. The discrepancy between the trailing singular values of the TDB-CUR and FOM stems from the effect of unresolved modes in the time integration of the low-rank approximation. However, as observed in the error in figure 9b, accurately resolving the leading singular values results in very small errors. Additionally, the error is controlled by lowering the error threshold for rank addition, leading to improved accuracy of the TDB-CUR approximation.
Figure 8. Stochastic advection–diffusion-reaction equation: First three spatial modes of TDB-CUR at different time-steps and the selected points of DEIM with oversampled (OS) points. (). Figure 9. Stochastic advection–diffusion-reaction equation: (a) Comparison of first 10 singular values of FOM versus TDB-CUR with 5 oversampled points (). (b) Error () of TDB-CUR versus time. The result is presented for different values of the upper error bound () for mode addition.
In the second case, we take samples of the random diffusion coefficient. This case is of particular interest, as it demonstrates the true potential of the TDB-CUR in cases where the FOM is too costly to run. In order to execute this scenario with the FOM, not only would we require sufficient memory to store the solution matrix of size , we would also have to compute the nonlinear map of this massive matrix at each time step. On the other hand, with the new methodology, we never require storing a matrix larger than or . To demonstrate this capability, we solve the TDB-CUR with on a laptop computer. Since the available computational resources (our laptop) did not have sufficient memory to store the FOM solution matrix, we could not solve the FOM for comparison. Instead, we performed a convergence study by decreasing the threshold for rank addition, . By decreasing , we observe two things: (i) the rank is increased more rapidly and (ii) the maximum rank is increased. Figure 10 depicts the singular values of the TDB-CUR method versus time (a), and the rank at each time step for different values of (b). As is decreased, the rank is increased, and we observe convergence in the leading singular values.
Figure 10. Stochastic advection–diffusion-reaction equation: (a) Singular values of TDB-CUR with 100 000 samples and 5 oversampled points. (b) Rank versus time. The result is presented for different values of the upper error bound () for mode addition.
4. Conclusion
The objective of this work was to develop a method to solve nonlinear MDEs that is accurate, well-conditioned, computationally efficient and minimally intrusive. To this end, we presented the TDB-CUR algorithm for solving MDEs via low-rank approximation. The algorithm is based on a time-discrete variational principle that leverages sparse sampling to efficiently compute a low-rank matrix approximation at each iteration of the time-stepping scheme. Numerical experiments illustrate that the TDB-CUR algorithm provides a near-optimal low-rank approximation to the solution of MDEs, while significantly reducing the computational cost. Moreover, we showed the method is robust in the presence of small singular values, and significantly outperforms DLRA based on the time continuous variational principle, unconventional integrator and PRK4. Although not investigated in the present work, the TDB-CUR algorithm is also highly parallelizable, making it an attractive option for high-performance computing tasks.
While the presented approach is minimally intrusive and can be applied to systems containing general nonlinearities, the goal of future work should be to make this method fully non-intrusive, allowing the FOM to be leveraged as a black box. This will allow the method to be applied to proprietary solvers while reducing the overall implementation efforts, making this powerful methodology more accessible to researchers and practitioners alike.
Data accessibility
Code is available as electronic supplementary material [58].
Declaration of AI use
We have not used AI-assisted technologies in creating this article.
Authors' contributions
M.D.: conceptualization, data curation, formal analysis, investigation, methodology, validation, visualization, writing—original draft, writing—review and editing; G.P.: conceptualization, data curation, formal analysis, investigation, methodology, software, validation, visualization, writing—original draft, writing—review and editing; H.N.: conceptualization, data curation, formal analysis, investigation, methodology, software, validation, visualization, writing—original draft, writing—review and editing; D.C.D.R.F.: conceptualization, formal analysis, investigation, writing—review and editing; H.B.: conceptualization, data curation, formal analysis, funding acquisition, investigation, methodology, project administration, resources, software, supervision, validation, visualization, writing—original draft, writing—review and editing.
All authors gave final approval for publication and agreed to be held accountable for the work performed therein.
Conflict of interest declaration
We declare we have no competing interests.
Funding
This work is supported by the Air Force Office of Scientific Research award no. FA9550-21-1-0247 and funding from Transformational Tools and Technology (TTT), NASA grant no. 80NSSC22M0282. Computational resources are provided by the Center for Research Computing (CRC) at the University of Pittsburgh.
Acknowledgements
The authors thank Dr Gianluca Ceruti for numerous insightful and stimulating discussions that led to a number of improvements.