Sparse nonlinear models of chaotic electroconvection

Convection is a fundamental fluid transport phenomenon, where the large-scale motion of a fluid is driven, for example, by a thermal gradient or an electric potential. Modelling convection has given rise to the development of chaos theory and the reduced-order modelling of multiphysics systems; however, these models have been limited to relatively simple thermal convection phenomena. In this work, we develop a reduced-order model for chaotic electroconvection at high electric Rayleigh number. The chaos in this system is related to the standard Lorenz model obtained from Rayleigh–Benard convection, although our system is driven by a more complex three-way coupling between the fluid, the charge density, and the electric field. Coherent structures are extracted from temporally and spatially resolved charge density fields via proper orthogonal decomposition (POD). A nonlinear model is then developed for the chaotic time evolution of these coherent structures using the sparse identification of nonlinear dynamics (SINDy) algorithm, constrained to preserve the symmetries observed in the original system. The resulting model exhibits the dominant chaotic dynamics of the original high-dimensional system, capturing the essential nonlinear interactions with a simple reduced-order model.


Introduction
In convection, a body force acting on the fluid can lead to the formation of coherent flow structures, with examples including thermal Rayleigh-Bénard convection (RBC) [1][2][3][4], surface tension-driven Marangoni effects [5][6][7], solar magnetoconvection [8,9], magnetohydrodynamic convection [10,11], and electroconvection (EC). Quantitative analysis of thermal convection phenomena was first offered by Lord Rayleigh in the early twentieth century [2]. The initially structured convective patterns were subjected to symmetry breaking perturbations and developed into chaotic motion when a critical parameter was reached. Lorenz, Fetter, and Hamilton 1 [12] later discovered a simple reduced-order model for RBC, explaining that the transition to chaotic flow is driven by a thermal gradient, i.e. a two-way coupling between the body force and the fluid motion. This landmark result changed how researchers view the role of numerical simulations and reduced-order modelling in science, and it established the modern field of chaos theory. Similar approaches have been applied to rotating convection or Ferrofluid convection resulting in a three-mode second-order nonlinear model known as the Glukhovsky-Dolzhansky system [13][14][15][16]. However, this type of simplified dynamical system model has not yet been demonstrated for convective systems with more complex coupling mechanisms. Electroconvection is a suitable system to extend these analyses, as it involves a nontrivial three-way, multiphysics coupling between the fluid, the electric field, and the charge density. In this work, we demonstrate a modern data-driven approach to identify sparse nonlinear reduced-order models for electroconvection. Our approach, in a way, automates the methods employed by Lorenz, leveraging recent techniques in sparse regression and optimization (see table 1).
The earliest recorded electrohydrodynamic (EHD) experiment dates back to the seventeenth century [17], and the phenomenon of electroconvection in EHD was first reported by G. I. Taylor, while describing the cellular convection in a liquid droplet [18]. Melcher later extended EHD theory to include EC, in what is now known as the Taylor-Melcher (TM) model. Since then, EC has been observed in other systems that exhibit coupling between electrostatic force and fluid motion. The term 'electroconvection' has been used in several different contexts, including electro-kinetic instability (EKI) [19][20][21] and electro-hydrodynamic (EHD) convection [22][23][24]. EC patterns have also been observed in liquid crystals when induced by an alternating current [23,25]. In charge-neutral electrokinetic (EK) systems, electroconvection is triggered by the electro-osmotic slip of the electrolyte in the electric double layer at membrane surfaces [19,20,[26][27][28][29][30][31][32][33][34][35][36]. In non-equilibrium EHD systems royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 202367 [18,[37][38][39][40][41][42][43][44], poorly conductive and leaky dielectric fluid acquires unipolar charge injection at the interface in response to the electric field. Chaotic flow in EHD convection was first observed and analysed by Atten, Lacroix, and Malraison [38,45]. They characterized two types of behaviours of the power spectra of the intensity fluctuations, identifying two scenarios: dominating viscous force or inertia. They later reported that chaotic charge mixing occurs even at low Reynolds number (Re), which in turn can be interpreted as the origin of EC chaos, leading to the onset of hydrodynamic turbulence [46]. The transition to chaos occurs in a variety of convective flow systems at high Rayleigh number (Ra), which is the ratio of body forces to viscous forces. Similar to chaotic RBC, chaotic EC is parameterized by a high electric Rayleigh number, also known as the Taylor number (T).
The stability of EC systems is now routinely studied with high-fidelity numerical simulations [39,41,42,44,[47][48][49][50][51][52], resulting in large volumes of spatiotemporal data. Despite the high-dimensional nature of these data, many phenomena in electroconvection and other systems may be characterized by the evolution of a few dominant coherent structures [53][54][55] or modes. High-dimensional simulations often obscure this simplicity, making optimization and control tasks prohibitively expensive [56]. Thus, there is a need for accurate and efficient reduced-order models that describe EC phenomena, similar to the Lorenz model for RBC [12]. The Lorenz model was obtained by Galerkin projection of the governing equations onto a dramatically reduced set of three modes, given by the first three Fourier modes proposed by Saltzman [57]. Since Lorenz, this approach has been applied extensively to RBC [58][59][60][61][62][63][64][65][66]. Instead of extracting these modes manually, modes are now typically extracted automatically, for example via proper orthogonal decomposition (POD) [53,[67][68][69][70]. 2 Galerkin projection of the governing partial differential equation onto POD modes is a more modern approach to obtain a reduced set of ordinary differential equations [53,[76][77][78]. Recently, it was shown that closely related nonlinear reduced-order models could be obtained for fluids purely from measurement data, and without recourse to the governing equations, by applying the sparse identification of nonlinear dynamics (SINDy) algorithm [79][80][81][82] to time-series data of POD mode amplitudes. Loiseau et al. [80,81] showed that it is also possible to incorporate partially known physics, such as energy conservation, as a constraint in the SINDy regression.
In this work, we demonstrate a data-driven framework to obtain reduced-order models of EC, shown in figure 1. In particular, we examine a non-equilibrium system with unipolar charge injection and an  Figure 1. Summary of nonlinear reduced-order modelling framework for electroconvection. Electroconvection data are taken from the high-fidelity TRT LBM simulation. POD is performed on the charge density field data to extract the dominant spatial coherent structures (modes) and time series (coefficients) for how the mode amplitudes evolve in time. A strong symmetry is observed in the POD coefficient time series. We enforce symmetries in the first three coefficients as constraints on the sparse identification of nonlinear dynamics (SINDy) algorithm. SINDy discovers a sparse nonlinear model to describe the evolution of the mode coefficients, and this model may be used to reconstruct the full charge density field as a linear combination of the corresponding POD modes.
external electric field. We first extract dominant coherent structures from DNS of the three-way coupled system, using POD. Strong symmetries are observed in the time evolution of the POD modes, and these symmetries are used as constraints to identify a sparse nonlinear model with SINDy. Surprisingly, we find that the resulting model captures the dominant dynamics using data from the charge density field alone.

Governing equations, dimensional analysis and simulations
The governing equations for EHD driven flow with unipolar charge injection include the Navier-Stokes equations (NSE) with the electric forcing term F e ¼ Àr c rf in the momentum equation, the charge transport equation and the Poisson equation for electric potential: and where the asterisk denotes dimensional variables: the two-dimensional velocity field u Ã ¼ ðu Ã x , u Ã y Þ, the fluid density ρ Ã , the static pressure P Ã , the charge density r Ã c , and the electric potential ϕ Ã . The parameters are given by the dynamic viscosity μ, the ion mobility μ b , the ion diffusivity D c , and the electric permittivity ε. The electric force acts as a source term in the momentum equation (2.1b) [83,84]. The system can be non-dimensionalized [50] by introducing four dimensionless parameters [43,44,49]: where H is the distance between the electrodes (two plates infinite in x), ρ 0 is the injected charge density at the anode, and ϕ 0 is the voltage difference between the electrodes. In turn, the time t Ã can be nondimensionalized by H 2 /(μ b ϕ 0 ), u Ã -by the ion drift velocity u drift = μ b ϕ 0 /H, P Ã -by ρ 0 (μ b ϕ 0 ) 2 /H 2 , ϕ Ã -by ϕ 0 , and the charge density r Ã c -by ρ 0 . The physical interpretation of these parameters is as follows: M is the ratio between hydrodynamic mobility and the ionic mobility, T is the ratio between electric force to the viscous force, C is the charge injection level and Fe is the reciprocal of the charge diffusivity coefficient [43,44].
The nondimensionalized form of the governing equations (2.1) is and where variables without asterisks are dimensionless.

Direct numerical simulations
Several numerical approaches have been developed to study EHD, including finite-difference [39], particle-in-cell [47], and finite-volume methods with the total variation diminishing scheme [41,42].
More recently, the lattice Boltzmann method (LBM) was used to predict the linear and finiteamplitude stability criteria of the subcritical bifurcation in the EC flow [44,48]. A segregated solver was proposed that combines a two-relaxation time (TRT) LBM modelling of the fluid and charge transport and a Fast Fourier Transform (FFT) Poisson solver for the electrical field [49,50]. Due to its computational efficiency, direct numerical simulations in this study are performed using the TRT LBM approach to solve the transport equations for fluid flow and charge density, coupled to a fast Poisson solver for the electric potential [49,50,85]. The numerical method is implemented in C++ using CUDA GPU computing. A spatial resolution of 122 × 100 is used, balancing accuracy with efficiency; the number of threads in the x-direction in each GPU block is equal to the grid resolution royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 202367 in x, and the number of GPU blocks in the y-direction is equal to the grid resolution in y. FFT and IFFT operations are performed using the cuFFT library [86]. All variables are computed with double precision to reduce the truncation error. The numerical method was shown to be second-order accurate in both time and space.
The non-dimensional parameters used in this study are C = 10, M = 10, T = 312.5, and Fe = 4000. C = 10 corresponds to a strong charge injection; M = 10 and Fe = 4000 correspond to the typical values of mobility and charge diffusivity of a dielectric liquid [43,87]. T governs the viscosity of the flow, where T = 312.5 is near the onset of chaos. The macroscopic and mesoscopic boundary conditions are specified in table 2. The no-slip boundary conditions are applied at both electrodes for fluid flow. A constant charge density at the anode (lower plate) represents a unipolar injection; a zerodiffusive flux condition rr c ¼ 0 at the cathode (upper plate) represents an outflowing current. A constant electric potential is applied at the anode; the cathode is grounded (ϕ = 0). At mesoscale (LBM scale), the discrete distribution function of velocity f i (x, t) and charge density g i (x, t) are used. The details on the transformations between macro-variables (u, ρ c ) and meso-variables (f i , g i ) can be found in recent publications [49,50]. The LBM full-way bounce-back (FBB) scheme is used for the Dirichlet (no-slip) boundary conditions for the fluid flow and charge density at the lower plate [44,50]. The g i Neumann boundary condition is set as a current outlet boundary condition for charge density transport [44,49]. The abbreviations are listed in table 1. Figure 2a shows an illustration of the charge density from the electroconvection simulation, and figure 2b shows the unsteady nature of the ionic convection leading to electric field variation and consequently the unsteady flow patterns. This is apparent by examining the momentum equation, equation (2.3b), as the variance in the source term drives the instability in the flow if the viscosity term cannot dampen the flow fluctuations. For the chosen parameters C = 10, M = 10, T = 312.5, and Fe = 4000, the flow is driven by a strong charge injection with a high electric force to viscous force ratio (T). The irregular profiles of charge density represent states with an imbalance of electric force and viscous force.

Coherent structures and symmetries
We now extract dominant coherent structures and symmetries from the time-resolved DNS data from above. In particular, we use POD to extract an orthogonal set of spatial modes from the charge density field, along with time series for how these mode amplitudes evolve in time; these modes and amplitudes are shown in figure 2c,d. A spatially and temporally resolved data set is used with a time step of Δt = 0.005 from t = 0 to t = 1000. The zeroth POD mode (PC0) represents the charge density mean-field and does not vary with time. The first POD mode (PC1) is strongly periodic in both space and time, with periodic structures corresponding to the two up-drifting ion channels between charge void regions, as shown in figure 2a. During the simulation, the ion channels oscillate and can merge and separate, as shown in figure 2b. The second and third POD modes (PC2,3) are a phase-shifted mode pair that contribute to the oscillation of the ion channels. Higher modes are not obviously paired, although they do correspond to spatial and temporal harmonics. POD was also computed on the other spatial fields, and structures were qualitatively similar. Figure 3 shows the trajectories of the first three POD coefficients (a 1 , a 2 and a 3 ) and their projections onto two-dimensional planes. These trajectories exhibit several strong symmetries, so that the system may be viewed as invariant with respect to the following transformations: ½a 1 , a 2 , a 3 $ ½a 1 , À a 2 , a 3 $ ½a 1 , a 2 , À a 3 $ ½a 1 , À a 2 , À a 3 $ ½Àa 1 , a 3 , a 2 $ ½Àa 1 , À a 3 , À a 2 : ð3:1Þ These symmetries in the original system will be enforced in our sparse nonlinear models.
Note that POD and Galerkin projection have been widely applied to RBC, although analyses are limited for EC. Sirovich et al. applied POD to RBC to investigate the scaling of POD modes to

Sparse nonlinear reduced-order models
We now demonstrate the identification of a parsimonious nonlinear model for EC using the sparse identification of nonlinear dynamics (SINDy) [79] approach; results are summarized in figure 4. SINDy has been widely applied for model identification in a variety of applications, including chemical reaction dynamics [88], nonlinear optics [89], fluid dynamics [80][81][82]90,91] and turbulence modelling [92,93], plasma convection [94], numerical algorithms [95], and structural modelling [96], among others [97][98][99]. Of particular note are its uses in identifying Lorenz-like dynamics from a thermosyphon simulation by Loiseau [82] and to identify a model for a nonlinear magnetohydrodynamic plasma system by Kaptanoglu et al. [100]. It has also been extended to handle more complex modelling scenarios such as partial differential equations [101,102], systems with inputs or control [103], to enforce physical constraints [80], to identify models from corrupt or limited data [104,105] and ensembles of initial conditions [106], and extending the formulation to include integral terms [107,108], tensor representations [109,110], deep autoencoders [111], and stochastic forcing [112,113]; an open-source software package, PySINDy, has been developed to integrate a number of these innovations [114]. We will enforce the symmetries observed above as constraints, as in Loiseau and Brunton [80]. We apply SINDy to develop a sparse reduced-order model of electroconvection, in particular the evolution of the POD coefficients a 1 (t), a 2 (t) and a 3 (t). First, we construct a data matrix X whose columns are time-series of a 1 , a 2 , and a 3 , along with the corresponding matrix of time derivative, _ X:  royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 202367 imagination and may be guided by terms present in the overarching partial differential equation. For fluid systems, polynomials have been quite effective: where the matrix X d denotes a matrix with column vectors given by all possible time series of dth degree polynomials in the state x = [a 1 a 2 a 3 ]. SINDy develops reduced-order models by selecting the fewest columns of QðXÞ that add up in a linear combination to describe the derivatives in _ X: where J is the sparse coefficient matrix that denotes which columns of Q are active, and hence, which terms are active in the dynamics. A parsimonious model will provide an accurate fit with as few terms as possible in J. Such a model can be developed using a convex regression: where J k denotes the kth column of J. There are several algorithms to obtain a sparse model J, and we use the sequential-thresholded least squares (STLS) algorithm [79]. Loiseau and Brunton [80] showed that it is also possible to incorporate known linear constraint equations in this regression: where j ¼ Jð : Þ is the vectorized form of the sparse matrix of coefficients, and Cξ = d are linear equality constraints, which can be used to enforce symmetries and other known constraints, such as energy conservation. We perform SINDy on the POD coefficients a i for i = 1, 2, 3 with the symmetries observed from the data shown in figure 3.

Enforcing symmetries in SINDy
As shown in figure 3, the trajectories remain nearly unchanged under the symmetry transformations in equation (3.1). These symmetries imply various constraints on the coefficients in J, resulting in several terms that are zero and several other terms that must be related across the dynamics; these constraints are summarized in table 3.
The symmetry constraints are determined as follows. First, we rewrite the unknown system as follows _ a 1 ¼ Qða 1 , a 2 , a 3 ÞJ 1 , ð4:6aÞ _ a 2 ¼ Qða 1 , a 2 , a 3 ÞJ 2 , ð4:6bÞ For the first row, equation (4.6a), the equation is invariant to switching the sign of a 2 and/or a 3 . Thus, the coefficients of the terms with odd powers of either a 2 or a 3 must vanish (ξ ij corresponding to a l 1 a m 2 a n 3 where at least one of m and n is odd). From the observed symmetry [a 1 , a 2 , a 3 ] ↔ [ − a 1 , a 3 , a 2 ], the coefficients ξ ij corresponding to a l 1 a m 2 a n 3 should be set to zero where l is even and non-constrained where l is odd. The rest of the ξ ij correspond to a l 1 a m 2 a n 3 where both m and n are even numbers, namely a 2 a 2 , a 3 a 3 , a 1 a 2 a 2 , a 1 a 3 a 3 . When a 1 is replaced by −a 1 , a 2 and a 3 must be swapped for the equation to remain unchanged, so the coefficients ξ ij for a 2 a 2 and a 3 a 3 should be equal and opposite and those for a 1 a 2 a 2 and a 1 a 3 a 3 should be identical.
The coefficients in the second and third rows, equations (4.6b) and (4.6c), should be considered together. From the symmetry transformations a 2 ↔ −a 2 and a 3 ↔ −a 3 , the coefficients ξ ij corresponding to a l 1 a m 2 a n 3 in the equation of _ a 2 and those corresponding to a l 1 a n 2 a m 3 in the equation of _ a 3 vanish when m is even or n is odd. The rest of the coefficients ξ ij in equation (4.6b) correspond to a l 1 a m 2 a n 3 where m is odd and n is even, namely a 2 , a 1 a 2 , a 1 a 1 a 2 , a 2 a 2 a 2 , a 2 a 3 a 3 . Similarly, the rest of the coefficients ξ ij in equation (4.6c) correspond to a l 1 a n 2 a m 3 where n is even and m is odd, namely a 3 , a 1 a 3 , a 1 a 1 a 3 , a 2 a 2 a 3 ,  a 3 a 3 a 3 . From the observed symmetry [a 1 , a 2 , a 3 ] ↔ [ − a 1 , a 3 , a 2 ], the coefficients ξij corresponding to a l 1 a m 2 a n 3 in equation (4.6b) and those corresponding to a l 1 a n 2 a m 3 in equation (4.6c) should be identical where l is even, and equal and opposite where l is odd.
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 202367 Table 3. Structure of the J matrix. We explicitly enforce these constraints in the SINDy regression. Each row of table 3 represents each row of the equation (4.3). The blank entries represent zeros, and each ξ i represents one value, i.e. the entries with the same ξ i should have the same values. To enforce the energy constraints as described by Loiseau [80], we set the coefficients with asterisk equal and opposite in table 3.

Reconstruction of the charge density field
The results from the SINDy model are depicted in figure 4. The reduced-order model accurately captures the qualitative behaviour of the system, characterized by four paths connecting two fixed points, denoted A and B in the figure. The system pseudo-randomly chooses each of the two paths starting at a given fixed point, resulting in the four orbits shown in figure 4c; this behaviour closely matches the behaviour of the full high-dimensional system. Further, the SINDy model may be used to recombine POD modes to approximate the high-dimensional dynamics around these fixed points and connecting paths. The SINDy reconstruction faithfully captures the truncated POD approximation of the true system; thus, a better model cannot be obtained without including more POD modes in the expansion.

Conclusion and discussion
In this work, we developed a data-driven reduced-order model for chaotic electroconvection. Our approach is reminiscent of that employed by Lorenz to model Rayleigh-Bénard convection, although dimensionality reduction and nonlinear model identification procedures are automated and do not require recourse to the governing equations. In particular, we investigate spatiotemporal data from direct numerical simulations of the three-way coupling between the fluid, charge density and electric field that give rise to chaotic electroconvection. We extract coherent structures from the charge density field using POD, and we develop a nonlinear model for how the POD mode amplitudes evolve in time using the SINDy modelling framework. Critically, we identify several symmetries in the timeseries data, which we later enforce as constraints in the SINDy algorithm, dramatically reducing the search space of candidate models. The resulting model faithfully captures the dominant behaviour of the leading POD modes, including chaotic switching in the time series that corresponds to chaotic convection in the full field. Because the nonlinear model is sparse and is formulated in terms of a few key spatial modes, it is interpretable by construction and may be less prone to overfitting compared to deep learning. Thus, we demonstrate the ability of automated model discovery techniques to uncover interpretable dynamical systems models for convective phenomena with more complex three-way coupling than the original Lorenz model of RBC.
There are a number of natural extensions and future directions suggested by this work. Similar to the Hopf-Rayleigh number of the Lorenz system [116], the electric Hopf-Rayleigh number determining the onset of electroconvection chaos can be identified by a linear stability analysis of the nonlinear system investigated here. Additionally, three-dimensional flows and experimental measurements would royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 202367 provide challenging tests to the methodology and may point out limitations that require further development. Although the present work only considered data from the charge density, incorporating information from the three fluid, electric, and charge density fields is the subject of ongoing research. This will require developing a dimensionally consistent POD, as has been done for compressible flows [117] and magnetohydrodynamics [100]. This modelling framework is also amenable to including the effect of varying parameters, which could result in a parameterized nonlinear model that would enable more sophisticated design optimization and the prediction of bifurcation events; however, incorporating a parameter into the model increases the size of the library of candidate functions and requires more training data. Increasing the number of POD modes in the model will also likely increase the fidelity of the prediction, for example enabling the modelling of subtle secondary effects in the data. However, when additional POD modes are added, it will be necessary to automatically extract symmetries and determine what constraints these symmetries impose on the candidate library. This challenge fits into the broader literature of symmetry reduction [118]. Finally, using these models for design, optimization, and control will be a true test of the model.