Short- and long-term predictions of chaotic flows and extreme events: a physics-constrained reservoir computing approach

We propose a physics-constrained machine learning method—based on reservoir computing—to time-accurately predict extreme events and long-term velocity statistics in a model of chaotic flow. The method leverages the strengths of two different approaches: empirical modelling based on reservoir computing, which learns the chaotic dynamics from data only, and physical modelling based on conservation laws. This enables the reservoir computing framework to output physical predictions when training data are unavailable. We show that the combination of the two approaches is able to accurately reproduce the velocity statistics, and to predict the occurrence and amplitude of extreme events in a model of self-sustaining process in turbulence. In this flow, the extreme events are abrupt transitions from turbulent to quasi-laminar states, which are deterministic phenomena that cannot be traditionally predicted because of chaos. Furthermore, the physics-constrained machine learning method is shown to be robust with respect to noise. This work opens up new possibilities for synergistically enhancing data-driven methods with physical knowledge for the time-accurate prediction of chaotic flows.


LM, 0000-0002-0657-2611
We propose a physics-constrained machine learning method-based on reservoir computing-to timeaccurately predict extreme events and long-term velocity statistics in a model of chaotic flow. The method leverages the strengths of two different approaches: empirical modelling based on reservoir computing, which learns the chaotic dynamics from data only, and physical modelling based on conservation laws. This enables the reservoir computing framework to output physical predictions when training data are unavailable. We show that the combination of the two approaches is able to accurately reproduce the velocity statistics, and to predict the occurrence and amplitude of extreme events in a model of self-sustaining process in turbulence. In this flow, the extreme events are abrupt transitions from turbulent to quasi-laminar states, which are deterministic phenomena that cannot be traditionally predicted because of chaos. Furthermore, the physics-constrained machine learning method is shown to be robust with respect to noise. This work opens up new possibilities for demonstrated the strong computational capability of such physical machine learning approaches. This approach worked particularly well when the physical system used as a reservoir had similarities with the system to be studied, for example, in flows [30].
On the other hand, another approach consists of including physical knowledge within traditional machine learning architectures [20,24,25,31,32], or modifying the target that the machine learning architecture has to learn. This automatically ensures physical conservation, such as by learning the Lagrangian of the system [33]. In [25,32], a physics-informed neural network was developed to approximate the solution of PDEs [25], or infer unmeasured variables from measurable quantities in flows [32]. This approach relied on a feedforward neural network combined with a physics-based loss to estimate the quantities to infer that are not present in the training data. This enhancement of the loss function with physical knowledge was also explored in [31], where the optimization process to identify the dynamics of a reduced order flow model was constrained with energy-preserving constraints in the Sparse Identification of Nonlinear Dynamics (SINDy) framework. The models obtained with those energy-preserving constraints compared favourably with reduced order models obtained with traditional Galerkin projection methods.
Further combinations of machine learning tools with physics-based methods were explored in the combination of computational fluid dynamics with machine learning [34,35]. For example, in [34], neural networks were used to discover improved spatial discretizations of PDEs on a coarse grid, which allowed for the accuracy to be equivalent to the accuracy of a finer grid, thereby enabling the accurate resolution of PDEs on a coarse mesh. Following similar ideas, in [35], convolutional neural networks were used within a direct numerical simulation solver to learn the correction required for a coarse simulation to reach the accuracy of a finer simulation allowing for a speed-up of the calculation while improving the accuracy drastically.
The discussion above highlights that additional research is necessary to enable purely datadriven machine learning approaches to time-accurately predict the evolution of chaotic systems during extreme events, therefore requiring the integration of some physical knowledge. In this paper, we propose a specific method based on a physics-based modification of the loss function used during the training of the machine learning model.
Specifically, the objective of this paper is to propose a machine learning method that (i) produces physical solutions to time-accurately predict extreme events in a qualitative model of shear turbulence and (ii) reproduces the long-term statistics. To achieve this objective, we will start from a reservoir computing framework based on the ESN and show that, by embedding physical knowledge during the training, the proposed physics-informed machine learning framework can achieve short-and long-term time-accurate predictions. This paper studies for the first time the application of such a hybrid physics-informed reservoir computing framework to the extreme event predictions where the inclusion of such physics-based knowledge can produce a marked improvement compared to purely data-driven approaches given that extreme events are generally rare within any dataset. The flow configuration is presented in §2. The physics-informed echo state network (PI-ESN) is developed in §3. Results are shown in §4. The long-term statistical behaviour of the network is discussed in §4a. Short-term predictions of extreme events are analysed in §4b. The robustness of the overall architecture with respect to noise is presented in §4c. A final discussion with future directions concludes the paper.

Extreme events in a model of chaotic flow
We regard the chaotic flow as an autonomous dynamical systeṁ where() is the temporal derivative; and N is a deterministic nonlinear differential operator, which encapsulates the numerical discretization of the spatial derivatives and boundary conditions (if any The flow is governed by momentum and mass conservation laws, i.e. the Navier-Stokes equations, which were reduced in form by Moehlis, Faisst and Eckhart (MFE) [36]. This model, which was inspired by earlier works [37,38], provides the nonlinear operator N . This is called the MFE model for brevity. The MFE model captures the qualitative features of the transition from turbulence to quasi-laminar states such as the exponential distribution of turbulent lifetimes. The velocity field in the model is decomposed as where v i (x) are spatial Fourier modes (or combinations of them) [36]. The Navier-Stokes equations are projected onto v i (x) to yield nine ordinary differential equations for the modes' amplitudes, a i , which are nonlinearly coupled. Consequently, the state vector is y = {a i } 9 1 . All the variables are non-dimensional [36]. Physically, v 1 is the laminar profile mode; v 2 is the streak mode; v 3 is the downstream vortex mode; v 4 and v 5 are the spanwise flow modes; v 6 and v 7 are the normal vortex modes; v 8 is the three-dimensional mode; and v 9 is the modification of the mean profile caused by turbulence. The flow has a fixed point a 1 = 1, a 2 = · · · = a 9 = 0, which is a laminar state. The electronic supplementary material reports the expressions for v i and the equations for a i [36]. Detailed analysis of the MFE model was performed by Kim & Moehlis [39] and Joglekar et al. [40].
The flow under investigation is incompressible. The domain is a cuboid of size L x × L y × L z between two infinite parallel walls at y = 0 and y = L y , which are periodic in the x-and z-directions. The domain size is L x = 1.75π , L y = 2 and L z = 1.2π . The Reynolds number is 600. A sinusoidal volume force is applied in the y-direction. The initial condition (electronic supplementary material) is such that the turbulent flow displays chaotic bursts between the fully turbulent and quasi-laminar states. These are the extreme turbulent events we wish to predict. The governing equations are integrated in time with a 4th-order Runge-Kutta scheme with a time step t = 0.25 to provide the evolution of the nine modes, a i , from t = 0 to t = 80 000. The evolution of the kinetic energy, k = 0.5 9 i=1 a 2 i , is shown in figure 1 for the first 30 000 timesteps. The time is normalized by the largest Lyapunov exponent, λ max ≈ 0.0244, which is calculated as the average logarithmic error growth rate between two nearby trajectories [7]. The Lyapunov time scale, λ −1 max , provides an estimate of the predictability time, which is used to define the non-dimensional time The kinetic energy, k, has sudden large bursts that arise from a chaotic oscillation with a small amplitude. Each burst is a quasi-relaminarization event, which occurs in three phases (figure 1): (i) the originally laminar velocity profile becomes unstable and breaks down into vortices due to the shear imposed by the volume force (panels 5-7); (ii) the vortices align to become streaks (panels 8-9 and 1-2); and (iii) the streaks break down leading to flow relaminarization (panels 3-5).

Physics-constrained reservoir computing
To learn the reduced-order dynamics of shear turbulence, we constrain the physical knowledge of the governing equations into a reservoir computing data-driven method based on the ESN [16,17]: the PI-ESN [20]. A schematic of the network is shown in figure 2.
We have training data with an input time series u(n) ∈ R N u and a target time series y(n) ∈ R N y , where n = 0, 1, 2, . . . , N t are the discrete time instants that span from 0 to T = N t t. During prediction, the target at time n becomes the input at time n + 1, i.e. u(n + 1) = y(n). The training of the PI-ESN is achieved by (i) minimizing the error between the prediction, y(n), and the target data y(n) when the PI-ESN is excited with the input, u(n) (  1 , is such that the physical error (also known as the residual) is zero, i.e. F (y) ≡ẏ − N (y) = 0. To estimate the physical error beyond the training data, after exciting the PI-ESN with the training data, the PI-ESN is then looped back to its input (figure 2b) to obtain The number of collocation points, N p , is user-defined. The physical error F ( y(N t + p)) is evaluated to train the PI-ESN such that the sum of (i) the physical error between the prediction and the available data from t = 0 to t = T, E d , and (ii) the physical error for t > T, E p , is minimized. Mathematically, we wish to find y(n) for n = 0, 1, . . . , N t + N p that minimizes reservoir y(n) = u(n + 1) Practically, the estimation of the time-derivative in F ( y) is performed with finite difference and, for a specific time-step k, the physical error is estimated as Therefore, this loss function penalizes solutions that do not fulfil the governing equations (to a numerical tolerance), similarly to physics-informed feedforward neural networks to approximate the solution of PDEs [25] or infer unmeasured quantities [32]. Here, we apply this physics-based loss to an echo state network to obtain a time-accurate surrogate model capable of autonomously reproducing the dynamics of the system.

(a) Network architecture
The architecture of the PI-ESN follows that of the ESN, which consists of an input matrix W in ∈ R N x ×N u , which is a sparse matrix; a reservoir that contains N x neurons that are connected by the recurrent weight matrix W ∈ R N x ×N x , which is another sparse matrix; and the output matrix W out ∈ R N y ×N x . The input time series, u(n), is connected to the reservoir through W in to excite the states of the neurons, x, as where tanh( · ) is the activation function. The output of the PI-ESN, y(n), is computed by linear combination of the reservoir states as y(n) = W out x(n). The matrices W in and W are randomly generated and fixed [41]. Only W out is trained to minimize (3.1). Following [19], each row of W in has only one non-zero element, which is drawn from a uniform distribution over [−σ in , σ in ]; W has an average connectivity d , whose non-zero elements are drawn from a uniform distribution over the interval [−1, 1]; and W is scaled such that its largest eigenvalue is Λ ≤ 1, which ensures the echo state property [41]. The same quadratic transformation of the reservoir state before readout as in [19] is also applied. The training of the PI-ESN is achieved in two steps. First, the network is initialized by an output matrix, W out , that minimizes a data-only cost functional E NP tot = E d + γ ||w out,i || 2 , where γ is a Tikhonov regularization factor and w out,i denotes the ith row of W out . This is the output matrix of the conventional ESN [18]. Second, the physical error (3.1) is minimized with the L-BFGS method [42], which is a quasi-Newton optimization algorithm. On the one hand, the proposed PI-ESN architecture has similarities to an Elman-RNN architecture because the PI-ESN can be interpreted as a three-layer RNN trained with a gradient-based optimizer. On the other hand, the PI-ESN is different from an Elmann-RNN because it employs a physics-based loss function, whose additional training is limited to W out to reduce the computational cost. In principle, other parts of the ESN, such as the input and recurrent weight matrices, W in and W, can be trained. This is not attempted here because the training of W out is sufficient to show a substantial improvement in the prediction accuracy ( §4).

Results
A grid search provides the hyperparameters Λ = 0.9, σ in = 1.0, d = 3, γ = 10 −6 , which enable accurate predictions in the range of N x = [500, 3000] neurons (electronic supplementary material). These optimal hyperparameters are determined for a data-only ESN, and are re-used for the PI-ESN, unless mentioned otherwise, to provide a base assessment of the improvement that can be obtained by introducing the physics-based loss. Only t = 2500 time units (equivalent to t + ≈ 61) in the window t = [11 500, 14 000] (equivalent to t + ≈ [280, 341] in the grey box of figure 1) are used for training. The data beyond this time window are used for validation only. We use N p = 5000 collocation points (equivalent to t = 1250 or t + ≈ 30.5), which provide a sufficient number of predictions beyond the training data with a relatively low computational time.
(a) Long-term statistical behaviour A long-term prediction of the modes' amplitudes from the trained ESN and PI-ESN is shown in figure 3 for a reservoir of 1500 units. This prediction is made for 10 000 time units, which corresponds to approximately 240 Lyapunov times. Out of that time-series, the first 5000 time units are shown in figure 3. Both the ESN and the PI-ESN qualitatively reproduce the dynamics of the MFE model. In particular, both the ESN and PI-ESN exhibit transition towards a quasilaminar state with large growth of the first mode, a 1 , and an associated increase in the kinetic energy, k.
A statistical assessment of the performance of the PI-ESN is analysed in terms of the probability density function (PDF) of k and a 1 ( figure 4). The PDFs are computed from a time series of 240 Lyapunov times, which is longer than the time series of figure 3 to ensure statistical convergence. The PI-ESN captures more accurately the tail of the PDFs of k and a 1 as compared to the dataonly ESN. This means that constraining the physics reproduces more accurately the occurrence of extreme events (associated with large values of a 1 and k) from a statistical point of view.
An additional quantitative comparison, with the statistics of the velocity field, is shown in figure 5 for ESNs and PI-ESNs of different reservoir sizes. The statistics, which are collected for 5000 time units, are computed by averaging the velocity along the two periodic directions, x and z, and time. Figure 5 shows that, for small reservoirs (500 units), both the ESN and PI-ESNs do not accurately reproduce the mean streamwise velocity profile and the variance. This is because there are not enough neurons in the reservoir to correctly reproduce the intricate dynamics of the MFE model and its long-term evolution. However, as the reservoir size increases to 1500 units, the PI-ESN accurately captures the mean velocity profile, the variance and the Reynolds stress. Larger reservoirs, with up to 3000 units, have an accuracy that is similar to the reservoir with 1500 units. Therefore, the resulting velocity profiles are shown only for 1500 units in figure 5. On the other hand, the data-only ESN solution is less accurate. A larger reservoir of 2500 units is required for the data-only ESN to accurately reproduce the statistics of the MFE model (result not shown). This highlights how the physical constraint augments the information used during the training of the PI-ESN allowing it to better capture the long-term evolution compared to a data-only ESN. The PI-ESN approach needs a relatively small training dataset of 2500 time units to learn the longterm statistics of the system. Compared to previous studies with LSTM units, these training data are two to three orders of magnitude smaller [15].

(b) Short-term extreme events prediction
where the denominator of the error cannot be zero because the fixed point of the MFE model has ||y|| = 1 and the system has unsteady chaotic oscillations. Although the same training data are used for both the PI-ESN and the conventional ESN, the PI-ESN has a significantly higher capability in accurately predicting autonomously the real evolution than the conventional ESN.
To compare the performance, we define the predictability horizon as the time required for E to exceed the threshold 0.2 from the same initial condition. The predictability horizon of the PI-ESN is ∼2 Lyapunov times longer than the predictability horizon of the conventional ESN. This improvement is achieved by enforcing the prior physical knowledge of the flow, whose evolution must fulfil the momentum and mass conservation laws. As shown in figure 6 figure 7a,b show the normalized absolute error between the predicted velocity field and the exact velocity field. The discrepancy in the velocity field is mainly due to the error on the prediction of the downstream vortex mode, a 3 ( figure 6). On the one hand, because no physical knowledge is constrained in the conventional ESN, the sign and amplitude of a 3 are incorrectly predicted, which means that the out-of-plane velocity evolves in the opposite direction of the exact solution. On the other hand, the PI-ESN is able to predict satisfactorily both the in-plane velocity and the out-of-plane velocity during the extreme event.
To quantitatively assess the robustness of these results, we compute the average predictability horizon of the machines with no further training. We follow the following steps: (i) by inspection of our dataset (a subset of which is shown in figure 1), we define events as extreme when their kinetic energy is k ≥ 0.1; (ii) we identify the times at which all the extreme events start in our   N x (figure 8). On the one hand, for small reservoirs (N x 2000), the performances of the ESN and PI-ESN are comparable. This means that the performance is more sensitive to the data cost functional, E d , than the physical error, E p . On the other hand, for larger reservoirs (N x 2000), the physical knowledge is fully exploited by the PI-ESN. This means that the performance becomes markedly sensitive to the physical error, E p . This results in an improvement in the average predictability of up to ≈1.5 Lyapunov times. Because an extreme event takes ≈3 Lyapunov time on average, the improved predictability time of the PI-ESN is key to the time-accurate prediction of the occurrence and amplitude of the extreme events. To further assess the accuracy of the PI-ESN with respect to the data-only ESN, we analyse data-only ESNs trained with different Tikhonov regularization factors (γ = 10 −5 and γ = 10 −7 ) in figure 8. The PI-ESN outperforms the data-only ESN.
For completeness, we re-tune the hyperparameter of the PI-ESN by performing a grid search, which provides the optimal parameters Λ = 0.7, σ in = 0.9, d = 3. The extreme event prediction horizon for the PI-ESN with these new optimal hyperparameters is shown in figure 8 (in green). A higher prediction horizon can be achieved with the hyperparameter re-tuning.

(c) Robustness to noisy data
Because the accuracy of the ESN and PI-ESN-and more generally, of any data-driven methoddepends on the quality of the training data, the effect of additive noise is analysed. Noise is added to the training dataset presented in figure 1 with signal-to-noise ratios (SNR) of 30 dB and 40 dB, which are representative of noise in experiments [43]. The ESN and PI-ESN are trained on the noisy data, and the accuracy of the long-and short-term prediction is analysed. The dataset length and hyperparameters are the same as those of § §4a and 4b. Here, because the training dataset is small (2500 time units, which correspond to 60 Lyapunov times), noise has a marked (detrimental) impact on the training of the ESN for learning the physical dynamics. (By contrast, when the training dataset is large-i.e. the data contain sufficient physical dynamics for the machine to be trained-a small addition of noise can have a positive effect on the robustness of the ESN [41].) Figure 9 shows the evolution of three modes, the normalized error and the noisy solution. Similarly to the noise-free case (figure 6), the PI-ESN is quantitatively more predictive than the data-only ESN. For example, the PI-ESN predicts the growth of the a 1 mode, which is physically the quasi-laminar profile forming during the extreme event. Although after 4 Lyapunov times the PI-ESN diverges due to the high sensitivity of chaotic flows to noise, the overall statistics are well predicted, as discussed next. One possible strategy to improve the performance is to increase the number of collocation points in the training of the PI-ESN.
The statistical assessment is shown in figure 10. Compared to the noise-free case of §4b, the predictability time of extreme events becomes smaller as the noise level becomes larger. This loss in accuracy is due to the noise in the training data. Both the ESN and PI-ESN minimize the error on the noisy data, E d , which causes the machines to try to fit the noisy data, resulting in the reduction of predictive capability. This loss becomes more significant as the noise increases. If the number of neurons in the reservoir is further increased, the ESN and, to a lesser extent, the PI-ESN will start overfitting the noise. The PI-ESN mitigates the loss of accuracy through the physical loss, E p , which acts as a regularization term that filters out the noise. This allows the PI-ESN to maintain a higher accuracy in the prediction of the extreme event ( figure 6). As a result, the PI-ESN provides a longer accurate prediction of the extreme events, by up to 0.4 Lyapunov time, compared to the data-only ESN. This effect becomes more significant as the number of neurons in the reservoir increases. In other words, the physical constraint in the PI-ESN acts as a denoizer. In the presence of noise, the PI-ESN does not attempt to fit the noisy data, but it balances the fit with the physics. The physical loss allows the PI-ESN to determine an appropriate physical prediction, which deviates from the noisy data to approach the physical dynamics. This denoizing property of the PI-ESN is consistent with previous studies [44,45]. This overcomes the lack of robustness of the data-only ESN, which cannot discriminate the noise from the actual flow dynamics. Finally, figure 11 compares the statistics of the velocity obtained from a long-term prediction of 5000 time units for a reservoir size of 1500 units (similarly to the noise-free case of figure 5). Consistently with the short-term performance, the accuracy on the statistics decreases as the noise level increases in the training data. For both noise levels, the PI-ESN outperforms the data-only ESN.

Final discussion and future directions
We propose a PI-ESN, which combines empirical modelling, based on reservoir computing, with physical modelling, based on conservation laws, to time-accurately predict extreme events (short-term dynamics) and the statistics (long-term dynamics) of a chaotic flow. We compare the performance of the PI-ESN with a fully-data driven ESN. The former is a physics-constrained machine, whereas the latter is a physics-unaware machine because it is trained with data only.
In the PI-ESN, the physical error from the conservation laws is minimized beyond the training data. This brings in crucial information, which is exploited in two ways: (i) with the same amount of available data, the PI-ESN solution is accurate for a longer time than the conventional ESN solution, i.e. the information from physical knowledge enhances the accuracy of the autonomous prediction of the PI-ESN, and (ii) fewer data are required to obtain the same accuracy as the conventional ESN. Here, we take advantage of property (i) for the prediction of extreme events in a model of turbulence, and the statistics of velocity. Extreme events may be generally rare, which makes it difficult for physics-unaware data-driven methods to be trained. By contrast, constraining the physics enables the PI-ESN to predict chaotic dynamics that cannot be inferred from data only. Additionally, the PI-ESN well captures the long-term flow statistics with relatively small training data. Finally, the approach also shows robustness with respect to noise.
It should be noted that the framework proposed in this paper relies on knowing the governing equations of the system studied, which is an ideal case. Cases when this knowledge is not perfect, for example, when only approximate equations are available, will constitute the scope for future work. In addition, the possibility of applying the proposed method to larger dimensional flow systems, such as the full Navier-Stokes equations, will also be explored. This can be achieved by using autoencoders to learn the spatial correlations, as shown in [46]. These avenues will be investigated in future research.