The art of coarse Stokes: Richardson extrapolation improves the accuracy and efficiency of the method of regularized stokeslets

The method of regularized stokeslets is widely used in microscale biological fluid dynamics due to its ease of implementation, natural treatment of complex moving geometries, and removal of singular functions to integrate. The standard implementation of the method is subject to high computational cost due to the coupling of the linear system size to the numerical resolution required to resolve the rapidly varying regularized stokeslet kernel. Here, we show how Richardson extrapolation with coarse values of the regularization parameter is ideally suited to reduce the quadrature error, hence dramatically reducing the storage and solution costs without loss of accuracy. Numerical experiments on the resistance and mobility problems in Stokes flow support the analysis, confirming several orders of magnitude improvement in accuracy and/or efficiency.

introduction to the subject, see the recent text [1]. A range of mathematical and computational techniques are available to approach this problem; a computational method that has seen significant uptake and development over the last two decades is the method of regularized stokeslets, first described by Cortez [2] and subsequently elaborated for three-dimensional flow [3,4].
This technique can be viewed as a modification of the method of fundamental solutions and/or the boundary integral method for Stokes flow [5], the basis for which is the stokeslet [6] or Oseen tensor [7]: þ (x j À y j )(x k À y k ) jx À yj 3 (1:2) and P k (x, y) ¼ 2ðx k À y k Þ jx À yj 3 : (1: 3) The pair of tensors S jk , P k provide the solutions u ¼ (8pm) À1 (S 1k , S 2k , S 3k ) and p = (8π) −1 P k to the singularly forced Stokes flow equations, À r r r r rp þ mr 2 u u u u u þ d(x À y)ê k ¼ 0 where f e (x) is a family of 'blob' functions approximating d(x) as e → 0.
Several different choices for f e and associated regularized stokeslets S e jk have been studied; the most extensively used was presented in the original three-dimensional formulation of Cortez et al. [3], and S e jk (x, y) ¼ d jk (jxj 2 þ 2e 2 ) þ x j x k (jxj 2 þ e 2 ) 3=2 : (1:10) Developments focusing on the use of alternative blob functions to improve convergence include [8] (nearfield) and, more recently, [9] (far-field). The pressure P e k (x, y) P k (x, y) and velocity S e jk (x, y) S jk (x, y) as e → 0; moreover the corresponding single layer boundary integral equation is jk (x, y)f k (y) dS y þ O(e p ), (1:11) where p = 1 for x on or near B and p = 2 otherwise [3]. In equation (1.11) and below, summation over repeated indices in j = 1, 2, 3 or k = 1, 2, 3 is implied. The reduction to the single-layer potential is discussed by e.g. [3,5,10]; in brief, this equation can describe flow due to motion of a rigid body, or with suitable adjustment to f k , the flow exterior to a body which does not change volume. A feature common to both standard and regularized stokeslet versions of the boundary integral equation is nonuniqueness of the solution f k . This non-uniqueness occurs due to incompressibility of the stokeslet, i.e. provided the interior of B maintains its volume, then ÐÐ B S jk n k dS y ¼ 0 so that if f k is a solution of equation (1.11) then so is f k + an k for any constant a. From the perspective of the original partial differential equation system, the non-uniqueness follows from the fact that the pressure part of the solution to equations (1.1) with velocity-only boundary conditions is determined only up to an royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 210108 additive constant. This issue is not dynamically important, and moreover the discretized approximations to the system described below result in invertible matrices.
Boundary integral methods have the major advantage of removing the need for a volumetric mesh, which both reduces computational cost, and moreover avoids the need for complex meshing and mesh movement. The key strength of the method of regularized stokeslets is in enabling the boundary integral method to be implemented in a particularly simple way: by replacing the integral by a numerical quadrature rule {x x x x x[n], w[n], dS(x x x x x[n])} (abscissae, weight and surface metric), equation (1.11) may be approximated by, As is standard terminology in numerical methods for integral equations, we will refer to this as the Nyström discretization [11]. By allowing m = 1, …, N and j = 1, 2, 3, a dense system of 3N linear ) is formed. The diagonal entries when j = k and m = n are finite but numerically on the order of 1/e, leading to (by the Gershgorin circle theorem) a well-conditioned matrix system. The approach outlined above can be used to solve the resistance problem in Stokes flow, which involves prescribing a rigid body motion and calculating the force distribution, and hence total force and moment on the body. Once the force and moment associated with each of the six rigid body modes (unit velocity translation in the x j direction, unit angular velocity rotation about x j axis, for j = 1, 2, 3) are calculated, the grand resistance matrix A can be formed [5], which by linearity of the Stokes flow equations relates the force F and moment M to the velocity U and angular velocity V for any rigid body motion; (1:13) For example, for a sphere of radius a centred at the origin, the matrix blocks are A FU = 6πμaI, A closely related problem is the two-step calculation of the flow field due to a prescribed boundary motion; starting with prescribed surface velocities u j (x x x x x[m]), first, the discrete force distribution F k [n] is found by inversion of the Nyström matrix system; the velocity field at any point in the fluidx x x x x can then be found through the summation, The mobility problem is formulated by prescribing the total force and moment on the body (yielding six scalar equations) and augmenting the system with unknown velocity U and angular velocity V, which adds six scalar unknowns, so that a (3N + 6) × (3N + 6) system is formed. At a given time, these unknowns can be related to the evolution of the body trajectories (in terms of position x x x x x 0 and two basis vectors b (1) and b (2) ), through a system of nine ordinary differential equations which can be solved using available packages such as MATLAB's ode45. Finally, the swimming problem further prescribes the motion of cilia or flagella with respect to a body frame (typically, a frame in which the cell body is stationary), and often assumes zero total force and moment (neglecting gravity and other forces such as charge), again resulting in a (3N + 6) × (3N + 6) system. The key numerical features and challenges of the method of regularized stokeslets are exhibited by the resistance and mobility problems, which will therefore be our primary focus. (see [12], contained case, equation (2.7)). Reducing the O(e) regularization error by reducing e therefore increases the O(e −1 h 2 ) stokeslet quadrature error, necessitating refinement of the discretization length h. To reduce e by a factor of R requires indicatively reducing h by a factor of ffiffiffi ffi R p , hence increasing the number of surface points and therefore degrees of freedom N by a factor of R. The cost of assembling the dense linear system then increases by a factor of R 2 , and the cost of a direct linear solver by a factor of R 3 . This calculation shows that, for example, improving from a 10% relative error to a 1% relative error may indicatively incur a cost increase of 1000 times. There are several approaches available already to address this issue, which involve a range of computational complexities: the fast multipole method [13], boundary element regularized stokeslet method [14] and the nearest-neighbour discretization [15], for example. In the next section, we will describe and analyse a very simple technique which alone, or potentially in combination with the above, improves the order of the regularization error, thereby enabling a coarser e and hence alleviating the quadrature error. We will then briefly review an alternative 'coarse' approach, the nearest-neighbour method, a benchmark with similar implementational simplicity. Numerical experiments will be shown in the Results ( §5), and we close with brief Discussion ( §6).

Richardson extrapolation in regularization error
Consider the approximation of a physical quantity (e.g. moment on a rotating body) which has exact value M Ã . The value of this quantity calculated with discretization of size h and regularization parameter e is denoted where E r (e) is the regularization error associated with the (undiscretized) integral equation, and E d (h; e) is the discretization error, which as indicated also has an indirect dependence on e via the quadrature.
Recall that and where E f (h) is the error associated with the force discretization and E q (h; e) is the quadrature error. The analysis below will focus on the situation in which the regularization parameter e is not excessively small, so that the quadrature error (h 2 /e) is subleading and hence the discretization error has minimal dependence on e, thus E d (h; e) ≈ E d (h; e 0 ) for some representative value e 0 . Writing we may then expand, Applying the matrix inverse, royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 210108 Hence, the estimate e M(e 1 , e 2 , e 3 ; h) : provides an approximation to M Ã that has error This improvement in order of accuracy comes at a small multiplicative cost associated with solving the problem three times; however, as these are three independent calculations they are ideally placed to exploit parallel computing architecture, thus reducing the additional computational cost.

Comparison with the nearest-neighbour regularized stokeslet method
Before carrying out numerical experiments, we will briefly recap a different strategy to address the e-dependence of the linear system size which we have developed and described recently, in order to provide a benchmark with similar implementational simplicity. The nearest-neighbour version of the regularized stokeslet method [16] aims to remove the e-dependence of the linear system size. This change is achieved by separating the degrees of freedom for traction from the quadrature by using two discretizations: } for the traction and a finer set {X X X X X [1], . . . , X X X X X[Q]} for the quadrature. If these sets are identical, the method reduces to the familiar Nyström discretization. In general, choosing N < Q leverages the fact that the traction is more slowly varying than the near-field of the regularized stokeslet kernel. Discretizing the integral equation (1.11) on the fine set gives Based on the observation that the traction f k (X X X X X[q]) and associated weighting w[q]dS(X X X X X[q]) are slowly varying, the method employs degrees of freedom F k [n] in the neighbourhood of each point of the coarse discretization, so that where ν[q, n] is a sparse matrix defined so that ν[q, n] = 1 if the closest coarse point to X X X X X[q] is x x x x x[n], and ν[q, n] = 0 otherwise. A detail that was not addressed in our recent papers ( [15,17], for example) is that the closest coarse point to a given quadrature point may not be uniquely defined. Moreover, it is occasionally possible that, for sufficiently distorted discretizations, a coarse point may have no quadrature points associated with it at all, resulting in a singular matrix. In the former case, the weighting may be split between two or more coarse points, so that the sum of each row of ν[q, n] is still equal to 1. In the latter case, the coarse point may be removed from the problem, or (better) the quadrature discretization refined.
The approximation (4.2) leads to the linear system, The computational complexity of the system is given by the 3N × 3Q function evaluations required to assemble the stokeslet matrix, followed by the O(N 3 ) solution of the dense linear system (for direct methods). The nearest-neighbour method is subject to similar O(e) regularization error and O(h f ) discretization error (where h f is characteristic of the force point spacing) as the Nyström method. Analysis of the quadrature error associated with collocation [12] identifies two dominant contributions: (i) Contained case: Quadrature centred about a force point which is also contained in the quadrature set is subject to a dominant error term O(e À1 h 2 q ), where h q is the spacing of the quadrature points; the Nyström method described above is a special case of this, with h q = h; royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 210108 (ii) Disjoint case: Quadrature centred about a force point which is not contained in the quadrature set is subject to a dominant error term O(h q /δ) 2 h q ), where δ > 0 is the minimum distance between the force point and quadrature points. This term does not appear in the Nyström method error analysis. The term is written in this form because δ is typically similar in size to h q for a given quadrature set, so with a little care, h q /δ behaves as a multiplicative constant.
For contained force and quadrature discretizations (i), the cost of quadrature is still an important consideration. Reducing e by a factor of R, necessitates reducing h 2 q by a factor of R, and hence increasing the number of quadrature points-and associated matrix assembly cost-by a factor of R. Therefore, any improvement to the order of convergence of the regularization error will result in a corresponding improvement in the reduction of quadrature error.
However, when disjoint force and quadrature discretizations (ii) are employed, the nearest-neighbour method is able to entirely decouple the strong dependence of the degrees of freedom (tied only to h f ) on the regularization parameter e and quadrature discretization h q . The nearest-neighbour method, therefore, provides a relatively efficient and accurate implementation of the regularized stokeslet method that, with minor care in the construction of the discretization sets, can be used as a benchmark. In the following section, we will assess the Richardson extrapolation approach against analytic solutions for two examples of the resistance problem, and against the nearest-neighbour method for an example mobility problem.

Results
We now turn our attention to the application of Richardson extrapolation to a series of model problems, comprising the calculation of: For each problem, we use the minimum distance between any two force points in the discretization as our comparative lengthscale h. For the [NyR] method, results are shown against the smallest value of the regularization parameter (e 1 ) used in the calculation. Simulations are performed with GPU acceleration (see [18]) using a Lenovo Thinkstation with an NVIDIA Quadro RTX 5000 GPU. Each of the test problems that we consider, however, are easily within the capabilities of more modest hardware. royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 210108

The grand resistance matrix of a rigid sphere
The [Ny] method is found to achieve 1% relative error for a select number of parameter pairs (e, h). This is strongly dependent, however, on the 'dip' in error which appears as h is decreased for a given e (evident in figure 1e) and is a consequence of the balance between the opposite-signed regularization and quadrature errors; the small h plateau remains above 1% error for each choice of e. By contrast, the [NyR] method is able to significantly reduce the error in the plateau (figure 1f ), resulting in sub-1% errors for e as large as 0.2. Indeed with e = 0.2, the range of values of h capable of producing acceptably accurate results extends from h = 0.00077 to h = 0.0076. As a result of the reduction in regularization error, brought about by the [NyR] extrapolation, this method is able to achieve a minimum relative error of 0:05% compared with 0:6% for the [Ny] method, and moreover, accurate performance no longer depends on a precise interplay between h and e. In the simulations we performed, the [NyR] method was able to attain very accurate results (0:1% error) in 250 s of walltime.

The grand resistance matrix of a rigid prolate spheroid
To assess the performance on a system involving a modest disparity of length scales, the second model problem is the calculation of the grand resistance matrix for a prolate spheroid of major axis length 5 and minor axis length 1. Moreover, prolate spheroids are often used as models for both entire microscopic swimming cells, and for their propulsive cilia and flagella, and so provide an informative test geometry. The exact solution in the absence of other bodies is well known (e.g. [19]). Details of the discretization of the prolate spheroid are provided in appendix B.1. A sketch of the discretization and plot of sDOF as h is varied are shown in figure 2a,b.
Similarly to the case of the unit sphere, the [Ny] method is able to achieve a minimum error of 0:8% for the smallest e in this study and a specific choice of h within the error dip ( figure 2c,e ). For each choice

The motion of a torus sedimenting under gravity
As a final test case, we simulate the mobility problem of a torus sedimenting under the action of gravity (for detailed set-up and discretization, see appendix B.2). In the absence of an exact solution to this problem, we compare the distance travelled in the vertical direction after the system (equations (B 6)-(B 8)) are solved for t ∈ [0, 98.7]. We compare the results obtained with the [Ny] and [NyR] methods with those from a simulation using the nearest-neighbour method ([NEAREST]) with a refined force discretization, disjoint force and quadrature discretizations and e = 10 −6 . Figure 3a-    The results for the smallest choice of regularization parameter, e = 0.01, are not converged with h, consistent with our analysis in §3 focusing on moderate values of e for which the quadrature error is subleading.

Discussion
This article considered the implementation of the regularized stokeslet method, a widely used approach in biological fluid dynamics for computational solution of the Stokes flow equations. An inherent challenge is the strong dependence of the degrees of freedom on the regularization parameter e, which necessitates an inverse-cubic relationship between the linear solver cost and the regularization parameter.
Here, we have investigated a simple modification of the widely used Nyström method, by employing Richardson extrapolation; performing calculations with three, coarse values of e and extrapolating to significantly reduce the order of the regularization error. The method was compared with the original Nyström approach on three test problems: calculating the grand resistance matrices of the unit sphere and prolate spheroid, and simulating the motion of a torus sedimenting under gravity. Investigation of these model problems has highlighted two significant phenomena, the first of which is well known but is worth repeating: (i) obtaining an acceptable level of error using the Nyström method is strongly dependent on being within the region where the (opposite-signed) regularization and quadrature errors exhibit significant cancellation, a phenomenon which has sensitive dependence on the discretization h as e is varied. (ii) The improvement in the order of regularization error provided by Richardson extrapolation is able to significantly and robustly reduce errors for simulations with royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 210108 (relatively) large choices of e, enabling highly accurate results with relatively modest computational resources. This advantage is (by design) only maintained for these coarse values of e, so that the regularization error is subleading. Another approach which improves the order of convergence of the (important) local regularization error is given by Nguyen & Cortez [8], although the resulting regularized stokeslets may not be exactly divergence-free. As discussed above, there are several existing approaches to improving the efficiency and accuracy of regularized stokeslet methods. The best approach in terms of strict computational complexity is the use of fast methods such as the kernel independent fast multipole method, which enables the approximation of the matrix-vector operation required for iterative solution of the linear problem [13,20], resulting in a O(N log N) method-although with somewhat greater implementational complexity. Another formulation is to borrow from the boundary element method developed for the standard singular stokeslet formulation [14], which has been applied to systems such as embryonic left-right symmetry breaking [21] and bacterial morphology [22]. The boundary element approach decouples the quadrature from the traction discretization and hence degrees of freedom of the system, enabling larger problems to be solved, although again at the expense of greater complexity through the need to construct a true surface mesh, with a mapping between elements and nodes. The nearestneighbour discretization [15] retains much of the simplicity of the Nyström method, while separating the quadrature discretization from the degrees of freedom. Provided that the discretizations do not overlap, we still find this method to be an optimal combination of simplicity and efficiency. The Richardson approach does not avoid the need for the regularization parameter to not exceed the length scales characterizing the physical problem, for example the distance between objects. In this respect, the nearest-neighbour approach is advantageous because of its ability to accommodate smaller values of the regularization parameter.
In this work, we have focused on demonstrating how a numerically simple modification to the, already easy-to-implement, Nyström method can provide excellent improvements by employing coarse values of the regularization parameter e. This approach can be considered complementary to the nearest-neighbour method in its coarse philosophy and style: both methods are figuratively coarse in their simplicity, and literally coarse in their approach of increasing numerical parameters. The Richardson approach allows increases in the regularization parameter; the nearest-neighbour approach allows increase the force discretization spacing h f . Either method enables more accurate results to be achieved with greater robustness and for lower computational cost. Moreover, both have the advantage of being formulated in terms of basic linear algebra operations, and therefore can be further improved through the use of GPU parallelization with minimal modifications [18]. The choice of which method to use is a matter of preference; the Richardson approach has the advantage of being immediately adoptable by any group with a working Nyström code, alongside the repeated calculations being embarrassingly parallel; the nearest-neighbour approach has the advantage of completely removing the dependence of the system size on e.
Accessible algorithmic improvements such as these provide the improved ability to solve a plethora of problems in very low Reynolds number hydrodynamics. Various potential application areas include microswimmers such as sperm [23,24], algae and bioconvection [25][26][27][28][29], mechanisms of flagellar mechanics [30,31], squirmers [32,33] and bio-inspired swimmers [34][35][36]. Stokeslet-based methods have been employed since the work of Gray and Hancock [6] in the 1950s; they continue to provide ease of implementation, efficiency and most importantly physical insight into biological systems. The location of points on the prolate spheroid, aligned with the x-axis, can be expressed in terms of the prolate spheroidal coordinates, as x ¼ a cosh m cos n, ( B 1 ) y ¼ a sinh m sin n cos f (B 2) and z ¼ a sinh m sin n sin f, where a and c are the major-and minor-axes lengths, respectively. We first discretize ν into n uniformly spaced points, providing a discretization in x which is slightly more dense in regions of higher curvature. For each choice of ν i (i ∈ [1, n]), we discretize ϕ into m i linearly spaced points, where the choice  royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 210108 ensures that each ring is approximately evenly discretized with spacing h. Here, dÁe represents the ceiling function.

B.2. A torus sedimenting under gravity
The equations of motion for a torus sedimenting under gravity are given by and ðð @D e ikj X k f j (X) dS X ¼ 0, (B 8) where repeated indices are summed over i ∈ [1,2,3], U and V are the translational and rotational velocities of the torus, ∂D defines the surface of the torus, the central-and tube-radii of the torus are given by R and r respectively, and e ijk is the Levi-Civita symbol. The term on the right-hand side of equation (B 7) derives from the (dimensionless) effect of gravity. The motion of the torus can be expressed as a system of nine ordinary differential equations for the time derivatives of the torus position x 0 and basis vectors b b b b b (1) and (2) ). More details of how this 'mobility problem' is solved can be found in [15]. While this problem could be further constrained by enforcing that the angular velocity is zero (due to the symmetry of the torus), we focus on solving for the full rigid body motion. The mobility problem is solved using the [Ny], [NyR] and [NEAREST] methods, with results given in §5.3. Points on the torus surface can be written as x ¼ (R þ r cos u) cos f, ( B 9 ) y ¼ (R þ r cos u) sin f (B 10) and z ¼ r sin u, (B 11) for u, f [ [0, 2p). We discretize θ into n ¼ d2pr=he linearly spaced points, ensuring points on each ring are approximately evenly spaced with lengthscale h. For each θ i (i ∈ [1, n]), we discretize ϕ into m i linearly spaced points via resulting in an approximately evenly spaced discretization for the torus with lengthscale h. For simulations with the [NEAREST] method, a fine quadrature discretization is created following the same process with lengthscale h q = h/4. To ensure disjoint force and quadrature discretizations in this case, a filtering step is performed to remove any quadrature points which lie within a distance h q /10 from their nearest force point.