# Discovery of nonlinear dynamical systems using a Runge–Kutta inspired dictionary-based sparse regression approach

## Abstract

In this work, we blend machine learning and dictionary-based learning with numerical analysis tools to discover differential equations from noisy and sparsely sampled measurement data of time-dependent processes. We use the fact that given a dictionary containing large candidate nonlinear functions, dynamical models can often be described by a few appropriately chosen basis functions. As a result, we obtain parsimonious models that can be better interpreted by practitioners, and potentially generalize better beyond the sampling regime than black-box modelling. In this work, we integrate a numerical integration framework with dictionary learning that yields differential equations without requiring or approximating derivative information at any stage. Hence, it is utterly effective for corrupted and sparsely sampled data. We discuss its extension to governing equations, containing rational nonlinearities that typically appear in biological networks. Moreover, we generalized the method to governing equations subject to parameter variations and externally controlled inputs. We demonstrate the efficiency of the method to discover a number of diverse differential equations using noisy measurements, including a model describing neural dynamics, chaotic Lorenz model, Michaelis–Menten kinetics and a parameterized Hopf normal form.

### 1. Introduction

Data-driven discovery of dynamical models has recently drawn significant attention as there have been revolutionary breakthroughs in data science and machine learning [1,2]. With the increasing ease of data availability and advances in machine learning, we can analyse data and identify patterns to uncover dynamical models that faithfully describe the underlying dynamical behaviour. Although inference of dynamical models has been intensively studied in the literature, drawing conclusions and interpretations still remains tedious. Moreover, extrapolation and generalization of models are limited beyond the training regime.

The area of identifying models using data is often referred to as system identification. For linear systems, there is an extensive collection of approaches [3,4]. However, despite several decades of research on learning nonlinear systems [5–8], the field is still far away from being as mature as that for linear systems. Inferring nonlinear systems often requires *a priori* model hypothesis by practitioners. A compelling breakthrough towards discovering nonlinear governing equations appeared in [9,10], where an approach based on genetic programming or symbolic regression is developed to identify nonlinear models using measurement data. It provides parsimonious models that accomplish a long-standing desire for the engineering community. A parsimonious model is determined by examining the Pareto front that discloses a trade-off between the identified model’s complexity and accuracy. In a similar spirit, there have been efforts to develop sparsity promoting approaches to discover nonlinear dynamical systems [11–15]. It is often observed that the dynamics of physical processes can be given by collecting a few nonlinear feature candidates from a high-dimensional nonlinear function space, referred to as a feature dictionary. These sparsity-promoting methods are able to discover models that are parsimonious, which in some situations can lead to better interpretability than black-box models. For motivation, we take an example from [14], where using data for fluid flow dynamics behind a cylinder, it is shown that one can obtain a model, describing the dynamics on-attractor and off-attractor and characterizing a slow parabolic manifold. Fluid dynamics practitioners can well interpret this model. Another example may come from biological modelling, where parsimonious models can describe how a species affects the dynamics of other species. Hence, the approach to discovering sparse models using dictionary learning can be interpreted in this way.

Significant progress in solving sparse regression problems [16–18] and in compressed sensing [19–22] supports the development of these approaches. Although all these methods have gained much popularity, the success largely depends on the feature candidates included in the dictionary and the ability to approximate the derivative information using measurement data accurately. A derivative approximation using sparsely sampled and noisy measurements imposes a tough challenge though there are approaches to deal with noise, e.g. [23]. We also highlight additional directions explored in the literature to discover nonlinear governing equations, which include discovery of models using time-series data [8], automated inference of dynamics [9,24,25] and equation-free modelling [13,26,27].

In this work, we re-conceptualize the problem of discovering nonlinear differential equations by blending sparse identification with a classical numerical integration tool. We focus on a widely known integration scheme, namely the classical fourth-order *Runge–Kutta* [28] method, noting that any other explicit high-order integration scheme, e.g. $3/8$-rule fourth Runge–Kutta method or the ideal of neural ODEs proposed in [29] incorporating any numerical integrator. In contrast to previously studied sparse identification approaches, e.g. [9,11,14], our approach does not require direct access or approximation of temporal gradient information. Therefore, we do not commit errors due to a gradient approximation. The approach becomes an attractive choice when the collected measurement data are sparsely sampled and corrupted with noise.

However, we mention that using numerical integration schemes in the course of learning dynamics has a relatively long history. The work goes back to [30,31], where the fourth-order Runge–Kutta scheme is coupled with neural networks to learn a function, describing the underlying vector field. In recent times, making use of numerical integration schemes with neural networks has again received attention and has been studied from the perspective of dynamical modelling, e.g. [32–34]. We particularly emphasize the work [34] that also uses a similar concept to learn dynamical systems using noisy measurements; precisely, it realizes the decoupling of noise and the underlying truth by enforcing a time-stepping integration scheme. As a result, one may obtain a denoised signal and the dynamical models describing the underlying vector field. Based on this, we have discussed an approach in [35] to learn dynamical models from noisy measurements using time-stepping schemes combined with neural networks that can handle missing data as well, which is not possible in the approach discussed in [34]. Recently, neural ODEs [29] have gained popularity to learn dynamical systems that show how to fuse any numerical integration efficiently in the course of learning models. Despite all these aforementioned methods being very general in the sense that they do not require any prior assumptions about the underlying system or structure of dynamical models, they are often black-box models; thus, interpretability and generalization of these models is unclear.

In this work, we also discuss an essential class of dynamical models that typically explains the dynamics of biological networks. It is also shown that regulatory and metabolic networks are sparse in nature, i.e. not all components influence each other. Furthermore, such dynamical models are often given by rational nonlinear functions. Consequently, the classical dictionary-based sparse identification ideology is not applicable as building all possible rational feature candidates is infeasible. To deal with this, the authors in [36] have recast the problem as finding the sparsest vector in a given null space. However, computing a null space using corrupted measurement data is a non-trivial task though there is some work in this direction [37]. Here, we instead characterize identifying rational functions as a fraction of two functions, where each function is identified using dictionary learning. Hence, we inherently retain the primary principle of sparse identification in the course of discovering models. In addition to these, we discuss the case where a dictionary contains parameterized candidates, e.g. ${\mathrm{e}}^{\alpha x}$, where $x$ is the dependent variable, and $\alpha $ is an unknown parameter. We extend our discussion to parametric and controlled dynamic processes. The organization of the paper is as follows. In §2, we briefly recap the classical fourth-order Runge–Kutta method for the integration of ordinary differential equations. After that, we propose a methodology to discover differential equations by synthesizing the integration scheme with sparse identification. Furthermore, since the method involves solving nonlinear and non-convex optimization problems that promote sparse solutions, §3 discusses algorithms inspired by a sparse-regression approach in [14,18]. In §4, we examine a number of extensions to other classes of models, e.g. when the governing equations are given by a fraction of two functions and involve model parameters and external control inputs. In the subsequent section, we illustrate the efficiency of the proposed methods by discovering a broad variety of benchmark examples, namely the chaotic Lorenz model, Fitz–Hugh Nagumo (FHN) models, Michaelis–Menten kinetics and parameterized Hopf normal norm. We extensively study the performance of the proposed approach even under noisy measurements and compare it to the approach proposed in [14]. We conclude the paper with a summary and high-priority research directions.

### 2. Discovering nonlinear governing equations using a Runge–Kutta inspired sparse identification

In this section, we describe our approach to discovering nonlinear governing equations using sparsely sampled measurement data. These may be corrupted using experimental and/or sensor noise. We establish approaches by combining a numerical integration method and dictionary-based learning. So, we develop methodologies that allow us to discover nonlinear differential equations without the explicit need for derivative information, unlike the approach proposed in e.g. [11,14,25]. In this work, we use the widely employed approach to integrate differential equations, namely the classical *fourth-order Runge–Kutta* (

#### (a) Fourth-order Runge–Kutta method

The

*a*. The local integration error due to the

#### (b) Discovering nonlinear dynamical systems

Next, we develop a

Having an extensive dictionary, one has many choices of candidates. However, our goal is to choose as few candidates as possible, describing the nonlinear function $\mathbf{f}$ in (2.1). Hence, we set up a sparsity-promoting optimization problem to pick a few candidate functions from the dictionary, e.g.

We take the opportunity to stress the imperative advantages of

When the data are corrupted with noise or do not follow

As discussed in §a, the

### 3. Algorithms to solve nonlinear sparse regression problems

Several methodologies exist to solve linear optimization problems that yield a sparse solution, e.g. LASSO [16,18]. However, the optimization problem (2.10) is nonlinear and likely non-convex. There are some developments in solving sparsity-constrained nonlinear optimization problems; e.g. [40, 41]. Though these methods enjoy many nice theoretical properties, they typically require *a priori* the maximum number of non-zero elements in the solutions, which is often unknown. Also, they are computationally demanding. Here, we propose two simple gradient-based sequential thresholding schemes, similar to the one discussed in [14] for linear problems. In these schemes, we first solve the nonlinear optimization problem (2.10) using a (stochastic) gradient descent method to obtain ${\mathit{\Xi}}_{1}$, followed by applying a thresholding to ${\mathit{\Xi}}_{1}$.

#### (a) Fix cut-off thresholding

In the first approach, we define a cut-off value $\lambda $ and set all the coefficients smaller than $\lambda $ to zero. We then update the remaining non-zero coefficients by solving the optimization problem (2.10) again, followed by employing the thresholding. We repeat the procedure until all the non-zero coefficients are equal to or larger than $\lambda $. This procedure is efficient as the current value of non-zero coefficients can be used as an initial guess for the next iteration, and the optimal $\mathit{\Xi}$ can be found with little computational effort. Note that the cut-off parameter $\lambda $ is important to obtain a suited sparse solution, but it can be found using the concept of cross-validation. We sketch the discussed procedure in algorithm 1.

For the iterative thresholding algorithm proposed for the sparse regression in [14], an analysis of the iterative thresholding algorithm is conducted in [42], showing a rapid convergence of the algorithm. In contrast to the algorithm in [14], algorithm 1 is more complex, and the underlying optimization problem is non-convex; thus, a thorough study of its convergence is out of the scope of this paper. However, we here mention that a rapid convergence of algorithm 1 is observed in numerical experiments, but its analysis will be an important topic for future research.

We also note that algorithm 1 always terminates as either the number of indices set to zero is increased (which terminates when the dictionary is exhausted), or the error criterion is satisfied. But the question remains as to whether the algorithm will converge to the correct sparse solution. A remedy to this can be to use the rationale of an ensemble, proposed in [43] to build an ensemble of sparse models. It can provide statistical quantities for the feature candidates in the dictionary. Based on these, we can construct a final sparse model based on statistical tools such as the $p$-values.

#### (b) Iterative cut-off thresholding

In the fix cut-off thresholding approach, we need to pre-define the cut-off value for thresholding. A suitable value of it needs to be found by an iterative procedure. In our empirical observations, applying fix thresholding at each iteration does not yield the most sparse solution in many instances. To circumvent this, we propose an iterative way of thresholding, as follows. In the first step, we solve the optimization problem (2.10) for $\mathit{\Xi}$. Then, we determine the smallest non-zero coefficients of $\Vert \mathit{\Xi}\Vert $ followed by setting all the coefficients smaller than this to zero. Like in the previous approach, we update the remaining non-zero coefficients by solving the optimization problem (2.10). We repeat the step of finding the smallest non-zero coefficient of the updated $\Vert \mathit{\Xi}\Vert $ and setting it to zero. We iterate the procedure until the loss of data fidelity is less than a given tolerance. Visually, it can be anticipated using the curve between the data-fitting and number of non-zero elements in $\Xi $, which typically exhibit an *elbow*-type curve. This approach is close to the *backward stepwise selection* approach used in machine learning for feature selection, e.g. [17]. We sketch the step of the procedure in algorithm 2. We shall see the use of this algorithm in our results section (see §d).

We note that the successive iterations converge faster to the optimal value after the first thresholding as we choose the coefficients after applying thresholding as the initial guess. Moreover, in our experiments, we observe that this thresholding approach yields better results, particularly when data are corrupted with noise. However, it may be computationally more expensive than the fixed cut-off thresholding approach as it may need more iterations to converge. Therefore, an efficient approach combining fixed and iterative thresholding approaches is a worthy future research direction.

### 4. A number of possible extensions

In this section, we discuss several extensions of the methodology proposed in §2, generalizing to a large class of problems. First, we discuss the discovery of governing differential equations given by a fraction of two functions. Next, we investigate the case in which a symbolic dictionary is parameterized. It is of particular interest when governing equations expected to have candidate features, e.g. ${\mathbf{e}}^{\alpha \mathbf{\text{x}}(t)}$, where $\alpha $ is unknown. We further extend our discussion to parameterized and externally controlled governing equations.

#### (a) Governing equations as a fraction of two functions

There are many instances where the governing equations are given as a fraction of two nonlinear functions. Such equations frequently appear in the modelling of biological networks. For simplicity, we here examine a scalar problem; however, the extension to multi-dimensional cases readily follows. Consider governing equations of the form

Furthermore, it is worthwhile to consider governing equations of the form

Then, we believe that the nonlinear functions in (4.6) can be given as a sparse linear combination of the dictionaries, i.e.

We note that learning a rational dynamical mode with a small denominator may lead to numerical challenges. This could be related to fast transient behaviour, as the gradient can be significantly larger when the denominator is small. Therefore, such cases need to be appropriately handled, for example, with proper data normalization and sampling, although, in our experiment to identify a Michaelis–Menten kinetic model from data (see §5d), we have not noticed any unexpected behaviour.

#### (b) Discovering parametric and externally controlled equations

The

#### (c) Parameterized dictionary

The success of the sparse identification highly depends on the quality of the constructed feature dictionary. In other words, the dictionary should contain the right features in which governing differential equations can be given as a linear combination of a few terms from the dictionary. However, it becomes a challenging task when one aims at including, for instance, trigonometric or exponential functions (e.g. $\mathrm{sin}(ax),{\mathbf{e}}^{(bx)}$), where $\{a,b\}$ are unknown. In an extreme case, one might think of including $\mathrm{sin}(\cdot )$ and $\mathbf{e}(\cdot )$ for each possible value of $a$ and $b$. It would lead to a dictionary of infinite dimensions, hence becoming intractable. To illustrate it, we consider the following governing equation:

We conventionally build a dictionary containing exponential functions using several possible coefficients as follows:

### 5. Results

Here, we demonstrate the success of ^{1} of different complexity. In the first subsection, we consider simple illustrative examples, namely, linear and nonlinear damped oscillators. Using the linear damped oscillator, we perform a comprehensive study under various conditions, i.e. the robustness of the approach to sparsely sampled and highly corrupted data. We compare the performance of our approach to discover governing equations with [14]; we refer to it as ^{2}. In the second example, we study the chaotic Lorenz example and show that

#### (a) Two-dimensional damped oscillators

As simple illustrative examples, we consider two-dimensional damped harmonic oscillators. These can be given by linear and nonlinear models. We begin by considering the linear one.

##### (i) Linear damped oscillator

Consider a two-dimensional linear damped oscillator whose dynamics is given by

The results are shown in figure 2 and table 1. We note that *d*.

Next, we study the performance of both methodologies under corrupted data. We corrupt the measurement data by adding zero-mean Gaussian white noise of different variances. We present the results in figure 3 and table 2 and note that *d*. Naturally,

##### (ii) Cubic damped oscillator

Next, we consider a cubic damped oscillator, governed by

#### (b) Fitz–Hugh Nagumo model

Here, we explore the discovery of the nonlinear FHN model that describes the activation and deactivation of neurons in a simplistic way [46]. The governing equations are

#### (c) Chaotic Lorenz system

As the next example, we consider the problem of discovering the nonlinear Lorenz model [47]. The dynamics of the chaotic system is governed by

#### (d) Michaelis–Menten kinetics

To illustrate

*a*. Typically, governing equations explaining biological processes involve rational functions. Therefore, we aim at discovering the enzyme kinetics model by assuming a rational form as shown in (4.1), i.e. the vector field of $\mathbf{s}(t)$ takes the form $\mathbf{\text{g}}(\mathbf{s}(t))/(1+\mathbf{\text{h}}(\mathbf{s}(t)))$.

Next, in order to identify $\mathbf{\text{g}}(\mathbf{s})$ and $\mathbf{\text{h}}(\mathbf{s})$, we construct polynomial dictionaries, containing terms up to degree $4$. After that, we employ

*c*. This allows us to build a Pareto front for the optimization problem and to choose the most parsimonious model that describes the dynamics present in the collected data. One of the most attractive features of learning parsimonious models is to avoid over-fitting and generalizing better in regions in which data are not collected. It is exactly what we observed as well. As shown in figure 9

*e*, the learned model predicts dynamics very accurately in the region far away from the training one.

Next, we study the performance of the method under noisy measurements. For this, we corrupt the collected data using zero-mean Gaussian noise of variance $\sigma =2\times {10}^{-2}$. Then, we process the data by first employing a noise-reduction filter, namely Savitzky–Golay, followed by normalizing the data. In the third step, we focus on learning the most parsimonious model by picking appropriate candidates from the polynomial dictionary. Remarkably, the method allows us to find a model with correct features from the dictionary and coefficient accuracy up to $5\mathrm{\%}$. Furthermore, the model faithfully generalizes to regimes outside the training, even using noisy measurements (figure 10).

#### (e) Hopf normal form

In our last example, we study the discovery of parameterized differential equations from noisy measurements. Many real-world dynamical processes have system parameters, and depending on them, the system may exhibit very distinctive dynamical behaviours. To illustrate the efficiency of

*a*. Next, we aim at constructing a symbolic polynomial dictionary $\Phi $ by including the parameter $\mu $ as the dependent variables. While building a polynomial dictionary, it is important to choose the degree of the polynomial as well. Moreover, it is known that the polynomial basis becomes numerically unstable as the degree increases. Hence, solving the optimization problem (2.9) becomes challenging. By means of this example, we discuss an assessment test to choose the appropriate degree of the polynomial in the dictionary. Essentially, we inspect data fidelity with respect to the degree of the polynomial in the dictionary. When the dictionary contains all essential polynomial features, then a sharp drop in the error is expected. We observe in figure 11

*b*a sharp drop in the error at the degree 3, and the error remains almost the same even when higher polynomial features are added. It indicates that polynomial degree 3 is sufficient to describe the dynamics. Using the dictionary containing degree 3 polynomial features, we seek to identify the minimum number of features from the dictionary that explains the underlying dynamics. We achieve this by employing

*c*,

*d*. It exposes the strength of the parsimonious and interpretable discovered models.

method | discovered model |
---|---|

$\begin{array}{rl}\dot{\mathbf{\text{x}}}(t)& \hspace{0.17em}=1.001\mu \mathbf{\text{x}}(t)-1.001\mathbf{\text{y}}(t)-0.996\mathbf{\text{x}}(t)(\mathbf{\text{x}}{(t)}^{2}+\mathbf{\text{y}}{(t)}^{2})\\ \dot{\mathbf{\text{y}}}(t)& \hspace{0.17em}=1.001\mathbf{\text{x}}(t)+1.010\mu \mathbf{\text{y}}(t)-1.006\mathbf{\text{x}}{(t)}^{2}\mathbf{\text{y}}(t)-1.004\mathbf{\text{y}}{(t)}^{3}\end{array}$ | |

$\begin{array}{rl}\dot{\mathbf{\text{x}}}(t)& \hspace{0.17em}=-0.961\mathbf{\text{y}}(t)+0.719\mu \mathbf{\text{x}}(t)+0.822\mu \mathbf{\text{y}}(t)-0.735\mathbf{\text{x}}{(t)}^{3}-1.044\mathbf{\text{x}}{(t)}^{2}\mathbf{\text{y}}\\ & \hspace{0.17em}\phantom{\rule{1em}{0ex}}-0.686\mathbf{\text{x}}(t)\mathbf{\text{y}}{(t)}^{2}-0.846\mathbf{\text{y}}{(t)}^{3}\\ \dot{\mathbf{\text{y}}}(t)& \hspace{0.17em}=0.986\mathbf{\text{x}}(t)+0.899\mu \mathbf{\text{y}}(t)-0.882\mathbf{\text{x}}{(t)}^{2}\mathbf{\text{y}}(t)-0.904\mathbf{\text{y}}{(t)}^{3}.\end{array}$ |

### 6. Discussion

This work has introduced a compelling approach (

This work opens many exciting doors for further research from both theory and practical perspectives. Since the approach aims at selecting the correct features from a dictionary containing a high-dimensional nonlinear feature basis, the construction of these feature bases in a dictionary plays a significant role in determining the success of the approach. There is no straightforward answer to this obstacle; however, there is some expectation that meaningful features may be constructed with the help of experts and empirical knowledge, or at least they may be realized in raw forms by them. Furthermore, we have solved the optimization problem (2.9) using a gradient-based method. We have observed that if feature functions in the dictionary are similar for given data, the convergence is slow, and sometimes it fails and is stuck in a non-sparse local minimum. In other words, the coherency between the feature functions is low. Hence, there is a need for the normalization step. In subsections (c) and (d), we have employed a normalization step to improve coherency. However, it is worth investigating better-suited strategies to normalize either data or feature spaces as a pre-processing step so that sparsity in the feature space remains intact. In addition to these, a thorough study on the performance of various noise-reduction methods (e.g. [35,52]) would provide deep insights into their appropriateness to

Methods discovering interpretable models that generalize well beyond the training regime are limited, and the proposed method

### Data accessibility

Our code and data can be found in the following link: https://github.com/mpimd-csc/RK4-SinDy.

### Authors' contributions

P.G.: conceptualization, data curation, investigation, methodology, software, writing—original draft, writing—review and editing; P.B.: conceptualization, data curation, investigation, methodology, writing—review and editing.

All authors gave final approval for publication and agreed to be held accountable for the work performed therein.

### Conflict of interest declaration

We declare we have no competing interest.

### Funding

Open access funding provided by the Max Planck Society.

Both authors received partial support from BiGmax (www.bigmax.mpg.de/), the Max Planck Society’s Research Network on Big-Data-Driven Materials Science. P.B. also obtained partial support from the German Research Foundation (DFG) Research Training Group 2297 ‘MathCoRe’, Magdeburg, Germany.

## Footnotes

1 Most of all examples are taken from [14].

2 We use the Python implementation of the method, the so-called

### References

- 1.
Jordan MI, Mitchell TM . 2015 Machine learning: trends, perspectives, and prospects.**Science**, 255-260. (doi:10.1126/science.aaa8415) Crossref, PubMed, ISI, Google Scholar**349** - 2.
Marx V . 2013 The big challenges of big data.**Nature**, 255-260. (doi:10.1038/498255a) Crossref, PubMed, ISI, Google Scholar**498** - 3.
Ljung L . 1999**System identification: theory for the user**. Englewood Cliffs, NJ: Prentice Hall. Google Scholar - 4.
Van Overschee P, de Moor B . 1996**Subspace identification of linear systems: theory, implementation, applications**. Dordrecht (Hingham, MA): Kluwer Academic Publishers. Crossref, Google Scholar - 5.
Kumpati SN, Kannan P . 1990 Identification and control of dynamical systems using neural networks.**IEEE Trans. Neural Netw.**, 4-27. (doi:10.1109/72.80202) Crossref, PubMed, Google Scholar**1** - 6.
Suykens JA, Vandewalle JP, de Moor BL . 1996**Artificial neural networks for modelling and control of non-linear systems**. New York, NY: Springer. Crossref, Google Scholar - 7.
Kantz H, Schreiber T . 2004**Nonlinear time series analysis vol**. Cambridge, UK: Cambridge University Press. Google Scholar - 8.
Crutchfield JP, McNamara BS . 1987 Equations of motion from a data series.**Complex Syst.**, 121. Google Scholar**1** - 9.
Bongard J, Lipson H . 2007 Automated reverse engineering of nonlinear dynamical systems.**Proc. Natl Acad. Sci. USA**, 9943-9948. (doi:10.1073/pnas.0609476104) Crossref, PubMed, ISI, Google Scholar**104** - 10.
Schmidt M, Lipson H . 2009 Distilling free-form natural laws from experimental data.**Science**, 81-85. (doi:10.1126/science.1165893) Crossref, PubMed, ISI, Google Scholar**324** - 11.
Wang WX, Yang R, Lai YC, Kovanis V, Grebogi C . 2011 Predicting catastrophes in nonlinear dynamical systems by compressive sensing.**Phy. Rev. Lett.**, 154101. (doi:10.1103/PhysRevLett.106.154101) Crossref, PubMed, ISI, Google Scholar**106** - 12.
Ozoli vs. V, Lai R, Caflisch R, Osher S . 2013 Compressed modes for variational problems in mathematics and physics.**Proc. Natl Acad. Sci. USA**, 18368-18373. (doi:10.1073/pnas.1318679110) Crossref, PubMed, ISI, Google Scholar**110** - 13.
Proctor JL, Brunton SL, Brunton BW, Kutz J . 2014 Exploiting sparsity and equation-free architectures in complex systems.**Eur. Phy. J. Spec. Top.**, 2665-2684. (doi:10.1140/epjst/e2014-02285-8) Crossref, ISI, Google Scholar**223** - 14.
Brunton SL, Proctor JL, Kutz JN . 2016 Discovering governing equations from data by sparse identification of nonlinear dynamical systems.**Proc. Natl Acad. Sci. USA**, 3932-3937. (doi:10.1073/pnas.1517384113) Crossref, PubMed, ISI, Google Scholar**113** - 15.
Brunton SL, Proctor JL, Kutz JN . 2016 Sparse identification of nonlinear dynamics with control (SINDYc).**IFAC-PapersOnLine**, 710-715. (doi:10.1016/j.ifacol.2016.10.249) Crossref, Google Scholar**49** - 16.
Friedman J, Hastie T, Tibshirani R . 2001**The elements of statistical learning**,**vol. 1**. New York, NY: Springer. Google Scholar - 17.
James G, Witten D, Hastie T, Tibshirani R . 2013**An introduction to statistical learning**,**vol. 112**. New York, NY: Springer. Crossref, Google Scholar - 18.
Tibshirani R . 1996 Regression shrinkage and selection via the lasso.**J. R. Stat. Soc. B (Methodological)**, 267-288. Crossref, ISI, Google Scholar**58** - 19.
Donoho DL . 2006 Compressed sensing.**IEEE Trans. Inform. Theory**, 1289-1306. (doi:10.1109/TIT.2006.871582) Crossref, ISI, Google Scholar**52** - 20.
Candès EJ, Romberg J, Tao T . 2006 Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information.**IEEE Trans. Inform. Theory**, 489-509. (doi:10.1109/TIT.2005.862083) Crossref, ISI, Google Scholar**52** - 21.
Candés EJ, Romberg JK, Tao T . 2006 Stable signal recovery from incomplete and inaccurate measurements.**Commun. Pure Appl. Math.**, 1207-1223. (doi:10.1002/cpa.20124) Crossref, ISI, Google Scholar**59** - 22.
Tropp JA, Gilbert AC . 2007 Signal recovery from random measurements via orthogonal matching pursuit.**IEEE Trans. Inform. Theory**, 4655-4666. (doi:10.1109/TIT.2007.909108) Crossref, ISI, Google Scholar**53** - 23.
Chartrand R . 2011 Numerical differentiation of noisy, nonsmooth data.**ISRN Appl. Math.**, 1-11. (doi:10.5402/2011/164564) Crossref, Google Scholar**2011** - 24.
Schmidt MD, Vallabhajosyula RR, Jenkins JW, Hood JE, Soni AS, Wikswo JP, Lipson H . 2011 Automated refinement and inference of analytical models for metabolic networks.**Phy. Biol.**, 055011. (doi:10.1088/1478-3975/8/5/055011) Crossref, PubMed, ISI, Google Scholar**8** - 25.
Daniels BC, Nemenman I . 2015 Efficient inference of parsimonious phenomenological models of cellular dynamics using S-systems and alternating regression.**PLoS ONE**, e0119821. Crossref, PubMed, ISI, Google Scholar**10** - 26.
Kevrekidis IG, Gear CW, Hyman JM, Kevrekidis PG, Runborg O, Theodoropoulos C . 2003 Equation-free, coarse-grained multiscale computation: enabling mocroscopic simulators to perform system-level analysis.**Commun. Math. Sci.**, 715-762. (doi:10.4310/CMS.2003.v1.n4.a5) Crossref, Google Scholar**1** - 27.
Ye H, Beamish RJ, Glaser SM, Grant SC, Hsieh C, Richards LJ, Schnute JT, Sugihara G . 2015 Equation-free mechanistic ecosystem forecasting using empirical dynamic modeling.**Proc. Natl Acad. Sci. USA**, E1569-E1576. Crossref, PubMed, ISI, Google Scholar**112** - 28.
Ascher UM, Petzold LR . 1998**Computer methods for ordinary differential equations and differential-algebraic equations**,**vol. 61**. Philadelphia, PA: SIAM. Crossref, Google Scholar - 29.
Chen RT, Rubanova Y, Bettencourt J, Duvenaud DK . 2018 Neural ordinary differential equations. In*Advances Neural Inform. Processing Sys.*, pp. 6571–6583 (edsBengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R ). Red Hook, NY: Curran Associates. Google Scholar - 30.
Rico-Martinez R, Anderson J, Kevrekidis I . 1994 Continuous-time nonlinear signal processing: a neural network based approach for gray box identification. In*Proc. of IEEE Workshop on Neural Networks for Signal Processing*, pp. 596–605. IEEE. Google Scholar - 31.
Gonzalez-Garcia R, Rico-Martinez R, Kevrekidis I . 1998 Identification of distributed parameter systems: a neural net based approach.**Comput. Chem.l Eng.**, S965-S968. (doi:10.1016/S0098-1354(98)00191-4) Crossref, ISI, Google Scholar**22** - 32.
Raissi M, Perdikaris P, Karniadakis GE . 2018 Multistep neural networks for data-driven discovery of nonlinear dynamical systems. (http://arxiv.org/abs/1801.01236). Google Scholar - 33.
Raissi M, Perdikaris P, Karniadakis GE . 2019 Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.**J. Comput. Phys.**, 686-707. (doi:10.1016/j.jcp.2018.10.045) Crossref, ISI, Google Scholar**378** - 34.
Rudy SH, Kutz JN, Brunton SL . 2019 Deep learning of dynamics and signal-noise decomposition with time-stepping constraints.**J. Comput. Phys.**, 483-506. (doi:10.1016/j.jcp.2019.06.056) Crossref, ISI, Google Scholar**396** - 35.
Goyal P, Benner P . 2021 Learning dynamics from noisy measurements using deep learning with a Runge-Kutta constraint. In*Workshop paper at the Symbiosis of Deep Learning and Differential Equations – NeurIPS*(available under the link https://openreview.net/pdf?id=G5i2aj7v7i) Google Scholar - 36.
Mangan NM, Brunton SL, Proctor JL, Kutz JN . 2016 Inferring biological networks by sparse identification of nonlinear dynamics.**IEEE Trans. Mol., Biol. Multi-Scale Commun.**, 52-63. (doi:10.1109/TMBMC.2016.2633265) Crossref, Google Scholar**2** - 37.
Gavish M, Donoho DL . 2017 Optimal shrinkage of singular values.**IEEE Trans. Inform. Theory**, 2137-2152. (doi:10.1109/TIT.2017.2653801) Crossref, ISI, Google Scholar**63** - 38.
He K, Zhang X, Ren S, Sun J . 2016 Deep residual learning for image recognition. In*Proc. IEEE Conf. Comp. Vision Patt. Recog.*, pp. 770–778. IEEE. (doi:10.1109/CVPR.2016.90) Google Scholar - 39.
Huang G, Liu Z, Van D, Weinberger KQ . 2017 Densely connected convolutional networks. In*Proc. IEEE Conf. Comp. Vision Patt. Recog.*, pp. 4700–4708. IEEE. (doi:10.1109/CVPR.2017.243) Google Scholar - 40.
Beck A, Eldar YC . 2013 Sparsity constrained nonlinear optimization: Optimality conditions and algorithms.**SIAM J. Optim.**, 1480-1509. (doi:10.1137/120869778) Crossref, ISI, Google Scholar**23** - 41.
Yang Z, Wang Z, Liu H, Eldar Y, Zhang T . 2016 Sparse nonlinear regression: parameter estimation under nonconvexity. In*Proc. of the 33${\hspace{0.17em}}^{rd}$ Intern. Conf. on Machine Learning*, pp. 2472–2481. PMLR. Google Scholar - 42.
Zhang L, Schaeffer H . 2019 On the convergence of the SINDy algorithm.**Multiscale Model. Simul.**, 948-972. (doi:10.1137/18M1189828) Crossref, ISI, Google Scholar**17** - 43.
Fasel U, Kutz JN, Brunton BW, Brunton SL . 2021 Ensemble-SINDy: Robust sparse model discovery in the low-data, high-noise limit, with active learning and control. (http://arxiv.org/abs/2111.10992). Google Scholar - 44.
Champion K, Zheng P, Aravkin AY, Brunton SL, Kutz JN . 2020 A unified sparse optimization framework to learn parsimonious physics-informed models from data.**IEEE Access**, 169259-169271. (doi:10.1109/ACCESS.2020.3023625) Crossref, ISI, Google Scholar**8** - 45.
de Silva B, Champion K, Quade M, Loiseau JC, Kutz J, Brunton S . 2020 PySINDy: a Python package for the sparse identification of nonlinear dynamical systems from data.**J. Open Sourc. Softw.**, 2104. (doi:10.21105/joss.02104) Crossref, Google Scholar**5** - 46.
FitzHugh R . 1955 Mathematical models of threshold phenomena in the nerve membrane.**Bull. Math. Biophys.**, 257-278. (doi:10.1007/BF02477753) Crossref, Google Scholar**17** - 47.
Lorenz EN . 1963 Deterministic nonperiodic flow.**J. Atmos. Sci.**, 130-141. (doi:10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2) Crossref, ISI, Google Scholar**20** - 48.
Savitzky A, Golay MJ . 1964 Smoothing and differentiation of data by simplified least squares procedures..**Anal. Chem.**, 1627-1639. (doi:10.1021/ac60214a047) Crossref, ISI, Google Scholar**36** - 49.
- 50.
Johnson KA, Goody RS . 2011 The original Michaelis constant: translation of the 1913 Michaelis–Menten paper.**Biochemistry**, 8264-8269. (doi:10.1021/bi201284u) Crossref, PubMed, ISI, Google Scholar**50** - 51.
Briggs GE . 1925 A further note on the kinetics of enzyme action.**Biochem. J.**, 1037. (doi:10.1042/bj0191037) Crossref, PubMed, Google Scholar**19** - 52.
Rudy SH, Brunton SL, Proctor JL, Kutz JN . 2017 Data-driven discovery of partial differential equations.**Sci. Adv.**, e1602614. (doi:10.1126/sciadv.1602614) Crossref, PubMed, ISI, Google Scholar**3** - 53.
Willcox KE, Ghattas O, Heimbach P . 2021 The imperative of physics-based modeling and inverse theory in computational science.**Nat. Comput. Sci.**, 166-168. (doi:10.1038/s43588-021-00040-z) Crossref, Google Scholar**1** - 54.
Loiseau JC, Brunton SL . 2018 Constrained sparse Galerkin regression.**J. Fluid Mech.**, 42-67. (doi:10.1017/jfm.2017.823) Crossref, ISI, Google Scholar**838**