Structure Learning in Coupled Dynamical Systems and Dynamic Causal Modelling

Identifying a coupled dynamical system out of many plausible candidates, each of which could serve as the underlying generator of some observed measurements, is a profoundly ill posed problem that commonly arises when modelling real world phenomena. In this review, we detail a set of statistical procedures for inferring the structure of nonlinear coupled dynamical systems (structure learning), which has proved useful in neuroscience research. A key focus here is the comparison of competing models of (ie, hypotheses about) network architectures and implicit coupling functions in terms of their Bayesian model evidence. These methods are collectively referred to as dynamical casual modelling (DCM). We focus on a relatively new approach that is proving remarkably useful; namely, Bayesian model reduction (BMR), which enables rapid evaluation and comparison of models that differ in their network architecture. We illustrate the usefulness of these techniques through modelling neurovascular coupling (cellular pathways linking neuronal and vascular systems), whose function is an active focus of research in neurobiology and the imaging of coupled neuronal systems.


Introduction
This paper sets out a general method for addressing the problem of structure learning; namely, identifying a coupled dynamical system that best accounts for empirical observations. In this context, a hypothesis about the structure of a system, for example the connectivity architecture of a neural network, is expressed formally as a model. The objective is to search over models (e.g., by pruning redundant connections) to arrive at a network architecture that optimally explains the data.
This approach rests upon dynamic casual modelling (DCM) (1), Bayesian model selection (BMS) (2) and Bayesian model reduction (BMR) (3), which are implemented in the freely available software called Statistical Parametric Mapping (SPM) (4) (see Table 1 for further description of terminologies used in this paper). These methods have been developed for the analysis of large scale recordings of brain activity, however they could be conveniently applied in other domains, where mechanistic modelling based on empirical data is of interest.
The basic idea behind DCM is to convert data assimilation, identification and structure learning problems into a generic inference problemand then use standard (variational Bayesian) techniques to infer all the unknowns; ranging from unobservable or latent states through to the structure or form of the dynamical system that best accounts for the data at *Author for correspondence (a.jafarian@ucl.ac.uk). ORCID ID: 0000-0001-9547-7941. †Present address: The Wellcome Centre for Human Neuroimaging, Queen square Institute of Neurology, University College London, WC1N 3AR, UK. hand. In other words, DCM enables qualitative and quantitative questions to be asked about the dynamical system generatingusually timeseriesdata.
Formally, DCM is the Bayesian inversion of a biophysically informed dynamical (i.e., state space) model, given some timeseries data; usually, neuroimaging data (M/EEG, fMRI). The models that underwrite DCM are generally specified in terms of (ordinary, stochastic or delay) differential equations describing the coupling within and among dynamical systems.
In DCM, hypotheses about the architecture of (directed) coupling are tested by comparing the evidence for the data under different models (1). DCM infers the coupling (parameters) of a nonlinear dynamical system, and estimates how changes in experimental context are mediated by changes in that coupling. This ability to model context sensitive changes in coupling distinguishes DCM from other data assimilation and identification methods (e.g., parameter tracking). The outcomes of model inversion using DCM are a posterior probability density over model parameters (that parameterise coupling and context sensitive effects) and the relative probability of having observed the data under each model (model evidence). This model evidence or marginal likelihood can be used to draw conclusions by comparing the evidence for different modelsknown as Bayesian model comparison and selection (BMS).
In the past two decades, DCM for functional magnetic resonance imaging (fMRI) has been applied in many studies in the field of cognitive neuroscience (e.g., aging (5), memory (6)) as well as psychiatric disorders (7)(8)(9). In parallel, using the same conceptual principles, DCM has also been applied to Magneto/Electroencephalography (M/EEG) data, to disambiguate neuronal causes of electromagnetic responses such as induced responses (10), phase coupling (11), event related potentials (12,13) and also to provide insights into underlying generators of neurological disorders (14)(15)(16). More recently, DCM has motivated and contributed to the development of research in theoretical neuroscience such as predictive coding (17), active inference (18) and, interestingly, the Bayesian brain hypothesis, which aims to establish the mathematical foundations of how brain is interact withand understandtheir environment (19).
In this paper, we review and illustrate recent developments in Bayesian inference that enable an efficient procedure for learning the structure of coupled dynamical systems. First, we present the theoretical foundations of structure learningusing DCMin a general form that may have wider application in engineering, physics and mathematical biology. We then present an example that highlights the usefulness of structure learning in the field of neuroscience. All software developments relating to the results in this paper are freely available through the academic SPM software (https://www.fil.ion.ucl.ac.uk/spm/). A glossary of technical terms used in this paper is provided in Table 1.

Dynamic Causal Modelling and structure learning
The pipeline for studying the underlying generators of neuroimaging data using DCM is shown in Figure 1. The procedure begins by designing and conducting an experiment to study some particular function of the brain. Data features are then selected from the measured data and one or more hypotheses are formally expressed as (biophysically informed) coupled dynamical systems. A Bayesian (variational Laplace) scheme is then used to infer the settings of the parameters for each model (e.g. coupling strengths) and to quantify each model's evidence. Structure learning is then performed to compare the alternative model architectures, using BMS and BMR; in order to identify the best explanation for the underlying generators of the data. In the following sections we describe each of these steps in turn, before turning to a worked example. a) The pipeline begins by designing an experiment to study a functional aspect of the brain, b) followed by recording brain activity using neuroimaging modalities, such as MEG or fMRI. c) Feature selection is performed on the neuroimaging data; for example, by calculating evoked responses by averaging over trials, or transforming neuroimaging data to the frequency domain. d) Having prepared the data, the experimenter postulates several hypotheses (specified as models) about the underlying generation of the neuroimaging data. These can be expressed in terms of between and within brain connections (effectivity connectivity). e) These models are fitted to the data by finding the setting of the parameters that optimise variational free energy (F). f) The evidence associated with each model is compared using Bayesian model selection and/or reduction, to identify the most likely structure that accounts for the data. Image credits: Image credits: MRI scanner by Grant Fisher, TN, screen in (a) and (e) by Dinosoft Labs, PK, all from the Noun Project, CC BY 3.0.

Experimental design
A DCM study begins by carefully articulating hypotheses and designing an experiment to test them. Typically, to maximise efficiency, there will be two or more independent experimental manipulations at the within-subject level, forming a factorial design. There may be additional experimental factors at the between-subject level, for instance to investigate differences between patients and controls, which will inform the strategy for sampling participants from the wider population. The hypotheses determine the choice of neuroimaging modality (e.g., M/EEG, fMRI), as well as the data features that will be selected and the type of dynamic causal model that will be used.

Feature selection
The next step is to select features in the collected data that are important (i.e., informative) from a modelling standpoint. This is known as feature extraction. For example, averaging time series over trials in response to stimulation gives event related potentials (ERPs) (20), or neuroimaging data can be represented in the frequency (21,22) or time-frequency domain (10).

Model specification
The hypotheses are then formally expressed in terms of biologically informed (state space) model architectures 1 to , each describing possible interactions between experimental inputs and neuronal dynamics. In effect, a model in DCM can be understood as a dynamical system distributed on a graph, where the neuroimaging data captures the activity of each node (either directlyas in fMRI or indirectly via some mapping to sensor spaceas in EEG). Depending on the scale and fidelity of the neuroimaging measurement, each node could, in principle, correspond to a compartment of a neuron, or to an individual neuron, or (more typically) to the average activity of millions of neurons constituting a neuronal population or a brain region. At any given scale, connections between nodes in this graph are referred to as the effective connectivity; namely, the effect of one node on another.
In general, partially observed discrete-continuous dynamical systems (which commonly arise in many mathematical and engineering applications) are well suited for modelling neural interactions (23). The generative (state space) model in DCM can be written as follows (24): The first line in equation 1 governs dynamics of interactions within and between nodes, where are (usually unobservable or hidden biological) states with a flow that depends upon parameters and exogenous or experimental input . When the exogenous input is a random fluctuation or innovation, equation 1 becomes a stochastic differential equation. The choice of coupling function is commonly motivated by biological principles e.g., (25), (26). The second line in equation 1 is known as the observer function, and links (usually observable neuroimaging) data to the hidden or latent variables e.g., (27,28). In the second line of equation 1, the function denotes contributions of hidden states (depending upon parameters θ) to the data. The second and the third terms in the second equation model non-interesting signal components (e.g., drift) and measurement noise, respectively (24,29).

Model identification
Given a prior probability density over the unknown parameters, ( ), initial states, (0), and neuroimaging data , DCM is used to infer the posterior probability of the parameters of each model, using a gradient descent on variational free energy, as its cost function (please see supplementary material for further information). The free energy, also known as the evidence lower bound in machine learning, is an upper bound on the log model evidence or marginal likelihood ( | ) where m denotes the model. It can be written as the difference between the log likelihood ( | , ) and the KL divergence between the prior and posterior over parameters, ( ( | ), ( | , )). These terms are the accuracy and complexity of the model respectively: The free energy scores the goodness of a hypothesis (i.e., model), which can be employed for model comparison. To identify the setting of the parameters that maximises free energy, DCM uses an estimation scheme known as variational Bayes under the Laplace approximation (i.e., the prior and posterior densities of latent variables have Gaussian distributions). Intuitively, a gradient descent on free energy offers a form of regularisation, because the KL divergence term (complexity) in equation 2 acts as a penalty term. Therefore, as the number of parameters increases or the parameters deviate far from their priors, the free energy decreases. This reduces the risk of overfitting or implausible parameter estimates.

Structure learning
Having inverted a model to obtain its evidence and parameters, the next step is to ask whether the structure of the model could be simplified to further optimise variational free energy. Recall that the free energy quantifies the trade-off between the accuracy and complexity of the modelso if a change to the network structure increases free energy, then the model has become more accurate (better fitting the data) and / or less complex (simpler in terms of its parameterisation). This process of selecting between network architectures (a.k.a. structure learning) depends on BMS; namely, the selection among different models based on their evidence,. This process can be performed automatically and rapidly over potentially thousands of alternative models, using an approach called Bayesian model reduction.
The hypotheses or models 1 , 2 ,…, with free energy 1 , 2 ,…, , can be compared using Bayesian model comparison . For any two inverted models, and with free energies of and , respectively, the log Bayes factor, log is defined as follows (2,30): Conventionally, a log Bayes factor above three indicates that there is 'strong evidence' for model over model (2,30).
Bayesian model comparison therefore allows pairwise model comparison, which in turn can be used to identify which of two models best accounts for the data. The posterior probability for each model can then be computed by application of Bayes rule. Under equal priors for each model, this simplifies to a logistic (sigmoid) function of the log Bayes factor: This process is easily generalized to comparisons with more than two models, by computing the log Bayes factor of each model relative to any one of the models in the comparison set. It some studies, rather than one model corresponding to one hypothesis, it can make more sense for a set or family of models to represent a particular hypothesis. This generalisation of BMSwhere the model space is grouped in several classes or familiesis referred to as family-wise model comparison (2). To explain this, let model space In this case, the prior probability of each class is 1 . Consequently, the prior probability that a model belongs to the class (with cardinal ) is ( ) = 1 . By applying Bayes' rule over the space of all models, one can calculate the posterior probability of model with evidence ( | ) , as follows: By definition, the posterior probability for each family (i.e., class) is the sum of the posterior probabilities of its constituent models. The posterior probability of each family can then be compared using Bayesian model comparison. One ubiquitous application of family comparison is to compare models with and without a common feature or propertyfor example, with or without a particular parameter. In this case, BMS can be performed on the model space comprising two groups of models, where the properties of interest appear in only one group. This can be used to establish whether such a property is necessary to explain the data at hand. Bayes' rule (we have dropped the dependency on the model for clarity): Where the model evidence is ( ) = ∫ ( | ) ( ), the log of which is approximated by variational free energy. Crucially, the priors determine which parameters (e.g., connections) should be informed by the data. Having estimated the parameters and free energy of the model, which we will refer to as the 'full' model, BMR provides an analytic and rapid technique for evaluating the relative evidence for an alternative model, which differs only in terms of the priors (̃). Typically, this alternative set of priors will fixcertain parameters to their prior mean, thereby reducing or pruning the model structure. For this reduced model, the approximate posterior (̃| ) of parameters under the reduced priors is again given by Bayes rule: The likelihood function ( | ) for the reduced and full models is the same, which enables equations 5 and 6 to be linked as follows: Next, to find the evidence of the reduced model, we integrate from both sides of equation 7 over the parameter space. Using the fact that ∫ (̃| ) = 1, the model evidence for the reduced model is as follows: Equations 7 and 8 have analytic solutions given Gaussian forms for prior and posterior densities, meaning that the coupling parameters and evidence for reduced models can be derived from those of the full model in milliseconds, on a typical desktop computer (32). This speed is leveraged in DCM for automatically discovering an optimal coupling structure for a network. In this setting, a greedy search is used, which iteratively generates candidate reduced models with different priors.
Their evidence is evaluated using BMR, and Bayesian model comparison is used to assess whether they should be retained or discarded. The ensuing coupling parameters from the best candidate models are averaged (using Bayesian model averaging based upon the evidence for each model) and returned as the optimal network structure for those data.

Drawing conclusions
Together, the procedures described above constitute all the necessary tools for learning the optimal structure of a coupled dynamical systemas evinced by some data. We started by defining the free energy, a scalar functional that quantifies the trade-off between the accuracy and complexity of a model. This quantity is estimated in the context of a 'full' model with all coupling parameters of interest. Then, using BMR, the free energy for reduced models are derived analyticallyeither based on a few specified modelsor by performing an automatic greedy search over potentially thousands of reduced models. Bayesian model comparison is used in both cases, to evaluate which model(s) should be favoured as explanations for the underlying generators of the data. This concludes our overview of the basic approach to coupled dynamical systems.
In what follows, we provide some worked examples to illustrate the sorts of inference and structure learning that is afforded.
Although this example reflects our interest (i.e., physiological coupling in the brain) the analysis can, in principle, be applied to any coupled dynamical system that can be articulated in terms of (stochastic, ordinary or delay) differential equations.

Worked Examples
To illustrate the theory reviewed in this paper, we consider modelling the neurovascular coupling system which ensures that brain cells (neurons) are adequately perfused with oxygenated blood in an activity-dependent fashion. This illustrates dynamical coupling two levels; first, the coupling between two different physiological systems (neuronal systems and haemodynamic systems). Second, the coupling among remote neuronal populations that underwrites distributed neuronal processing and representations. Understanding the functional architecture of neurovascular coupling facilitates our understanding of ischaemic brain injury (i.e., stroke) and, in research, establishes the origin of fMRI time series used in brain mapping. In short, neurovascular coupling determines how blood dynamics are altered due to neuronal demands for energy (29,33). The challenge in this field is that it is not possible to measure the activity of this system noninvasively in the human brain, and therefore modelling-based approaches have been widely applied.
In the following example, a coupled (biophysically informed) dynamical system that models the behaviour of neuronal responses and the vascular system was employed. This is illustrated for a single brain region (node of the coupled system) in

Worked Example 1: Bayesian Model Reduction for structure learning
The aim of the first example is to showcase an application of BMR to modelling the neurovascular system using fMRI time series (34). This analysis used an fMRI dataset from a previously conducted experiment, investigating the neural response to attention to visual motion (35). Three brain regions (i.e., nodes) were identified that showed significant experimental effects (visual areas V1, V5 and the frontal eye fields, FEF) and representative timeseries were extracted from each region (i.e., feature selection). To explain the underlying generators of these timeseries, a DCM was then specified comprising three canonical microcircuits (modelling within-region connectivity) that were coupled by between-region connections, creating a hierarchy of distributed processing nodes. As shown in Figure 2, in this model, pre-synaptic inputs to each neuronal population had collaterals projecting to the astrocytes. These were grouped into three parameters per neuronal population: collaterals from excitatory (positive) within-region connections, collaterals from inhibitory (negative) withinregion connections, and collaterals from extrinsic inputs from other regions. A weighted sum of these formed the input into the vascular system. The aim of this analysis was to illustrate how BMR can be used to identify the optimal reduced coupling structurethe minimal set of neurovascular parametersthat would optimally explain the data.

inhibitory interneuron and DP: deep pyramidal neuron). Bayesian model reduction (BMR) was performed using the posterior estimates of the parameters to identify a subset of parameters that best
accounted for the data. The ensuing model reduction shows that inhibitory signals to each population were the best explanations for the fMRI data.
The parameters of this model were estimated using the fMRI data (model identification), and those relating to neurovascular coupling are shown in Figure 3 (left). Next, BMR was applied, using an automatic greedy search over reduced models, to ask whether there was any sub-structure in the neurovascular model parameter space that could better account for the data, relative to the full model. The optimal reduced model is shown in Figure 3 (right), where inhibitory signals to three of the neuronal populations played a predominate role. From these data, one would conclude that collaterals from inhibitory inputs into superficial pyramidal cells, deep pyramidal cells and spiny stellate cells are sufficient to explain the BOLD signal. It should be noted that this analysis is only a proof of conceptbased on a single subject. However, it demonstrates that DCM and BMR enable the investigation of architectural questions about the underlying generators of fMRI time series using noninvasive recording. In particular, the use of BMR with an automatic search allowed a fast and efficient search over a large model space.

Worked Example 2: Bayesian fusion
This second study focussed on Bayesian fusion across neuroimaging modalities -MEG and fMRI. Data were collected using each modality, under the same cognitive task (and auditory oddball paradigm), to inform the neuronal and neurovascular / haemodynamic parts of the model, respectively (36). In the previous example, the neuronal part of the model had various parameters fixed (e.g., synaptic time constants), to estimate the neurovascular parameters using fMRI data. In this second study, the objective was to capitalize on the high temporal resolution of MEG to infer the parameters of the neuronal part of the model, before using the fMRI data to infer the neurovascular / haemodynamic part. Bayesian model selection was then used to infer the optimal reduced structure of the combined MEG-fMRI-informed model.
To enable flexible coupling of different neural and neurovascular models, this study introduced an interface between them called neuronal drive functions (Figure 4). These play a crucial role in the following analysis. First, active brain regions are identified from the fMRI data, using standard methods (i.e., statistical parametric mapping, SPM). The coordinates of This study also provides an opportunity to illustrate the use of pre-defined model spaces. Whereas in the first example, an automatic search was conducted over reduced models; here, a set of carefully specified models were compared to test four experimental questions or factors. The ensuing set of modelsthe model spacecharacterised neurovascular coupling in terms of four factors or questions. In brief, the factors were: (i) whether there were presynaptic (37) or postsynaptic (49) contributions to the neurovascular signal, (ii) whether neuronal responses from distal regions excite neurovascular coupling (38), (iii) whether the neurovascular coupling function was the same across all regions or should be defined in a regionspecific way (39), and finally (iv) whether a first or second order differential equation should be used for the dynamics of the neurovascular system, to determine if there was any delay between neuronal activity and the ensuing BOLD response (40,41). A total of 16 candidate models, with different combinations of these four factors, were evaluated using Bayesian model comparison. For each of the four experimental questions, the 16 models were grouped into families (e.g. all models with presynaptic vs postsynaptic input) and the probability for each family computed.
In this single subject study, the results of family wise BMS on each group of models identified strong evidence (with nearly 100% posterior confidence) for the following: The BOLD effect is caused by presynaptic signals, which is in line with the findings of (37), which found mean neuronal firing rates (presynaptic signals) induced BOLD signals.
(ii) Regional neuronal drives to haemodynamic responses induce vascular perfusion. This is consistent with general opinion that local neuronal activities induced BOLD contrast.
(iii) The strength of neurovascular coupling is region specific. This is in agreement with invasive animal studies suggesting that neurovascular coupling varies from brain region to region (39).
(iv) The response of the BOLD response to neuronal activity was instantaneous, rather than delayed.
This example illustrates some of the intricacies of structure learning using DCM and BMS. First, hypotheses about biophysical processes can be expressed formally as models, and in particular as a factorial design (i.e., a model space that can be partitioned among different factors or attributes). Second, data from different neuroimaging modalities can be combined using Bayesian fusion, where different parts the model are informed by different modalities. For flexibility and efficiency of model inversion, neuronal drive functions were introduced to act as a bridge between neural and haemodynamic models. Finally, each experimental question can be addressed through family-wise Bayesian model comparison. In summary, these two example applications illustrate the application of DCM, BMR and BMS to structure learning based on multi-modal neuroimaging data.

Discussion
In this paper, we have reviewed methods for structure learning in the context of coupled dynamical systems. In particular, we showcased applications in neuroscience using dynamic causal modellingthe Bayesian inversion of biophysically informed coupled dynamical systems, and Bayesian model selection and reduction for assessing the evidence for different models or hypotheses. To date, these tools have mainly been applied in the context of cognitive and clinical neuroscience to unravel the most likely functional architectures explaining neuroimaging data. DCM offers an efficient way to estimate the parameters of large-scale dynamical system based on a gradient decent of a variational free energy functional. This functional inherently scores different candidate architectures and coupling functions in terms of a trade-off between accuracy and model complexity.
In a general setting, DCM, BMS and BMR offer efficient pipelines for modellers to identify coupled dynamic systems in an evidence-based fashion. DCM has been applied to a wide range of problems including parameter estimation for deterministic (37) and stochastic (38,39) dynamical systems using time, frequency or time-frequency domain information.
In addition, DCM has been found to be useful for studying large networks, based on the centre manifold theorem (40) and parameter estimation of dynamical systems on a manifold (41). These examples demonstrate that structure learning based on DCM, BMR, and BMS provides general and efficient method that can be applied to a wide range of modelling problems of real word physical systems.
One might ask whether dynamic causal modelling has real word clinical applications, beyond research. For instance, would it be possible to use DCM as a part of biological control system, to supress or prevent unwanted activity in a diseased brain (e.g., epileptic seizures, or the symptoms of Parkinson's disease)? This question has been addressed in the setting of Parkinson's disease (42); however, there is a long road ahead before these models are sufficiently well developed and validated to be used in the treatment of neurological disorders. In a research context, an avenue receiving much attention is how to validate theories of brain function based on predictive coding and more generally the Bayesian brain hypothesisand in particular, what rules govern the transfer of information between layers of cortex (18,19,43). Here, tools such as CMC model, variational Laplace and structure learning using BMS and BMR are likely to prove useful. Table   Table 1: glossary of terms that are used in the paper and their description.

Term General Description
Bayesian belief updating An example of Bayesian belief updating is found in the context of inverting nonstationary EEG time series that undergoes transition into and out of paroxysmal activities (15). In this case, EEG data is first divided into several locally (quasi) stationary segments. Then, DCM is performed on the first segment to furnish posterior parameter estimates that becomes the prior for the subsequent segment. This is known as Bayesian belief updating. Subsequently, a trajectory of parameters over segments can be constructed that best explains the dynamics of nonstationary data.

Bayesian fusion and multimodal data integration
Bayesian fusion aims to combine Bayesian inferences about a (physical) system that are observed in various ways, in order to better model behaviour of the system. For instance, multimodal fMRI M/EEG imaging can leverage different aspects of brain dynamics. Here, the spatial specificity of fMRI is employed to localise active neuronal sources. This spatial information is being used as a prior for source localisation in DCM for M/EEG data to estimate neuronal parameters. The posterior estimate of parameters is then employed as a prior for inversion of haemodynamic responses from fMRI time series. Therefore, these modalities provide supplementary information that leads to a better understanding of the underlying function of the brain that rests upon Bayesian fusion.

Bayesian model comparison
Quantification of the relative evidence for different models of the same data. Typically this is expressed as a Bayes factor, which is the ratio of the model evidence (marginal likelihood) for each model relative to a selected comparison model.

Bayesian model reduction (BMR)
A statistical device for computing the posterior probability density over parameters and log model evidence for a reduced model, given the parameters and log model evidence of a full model. Here, reduced and full models differ only in their priors. Under Gaussian assumptions, this has an analytic form. Bayesian model selection (BMS) The selection of one or more of models with the highest evidence from a set of candidate models following Bayesian model comparison.

Canonical micro circuit (CMC)
The CMC is a biologically informed microcircuit model of into their and intralaminar connectivity in the cortical column (17,34). It comprises four populations whose activity can replicate a realistic pattern of M/EEG signals. Each population generates postsynaptic potentials (modelled by second order differential equations) induced by presynaptic firing rates from external sources (interregional or distal populations, and/or exogenous inputs). These postsynaptic potentials generate presynaptic firing rates (via a sigmoid transformation), which in turn excite or inhibit other populations.

Cortical column
The human cortex can be approximately divided into cylinders of diameter 500 mm where neurons within each cylindrical column activate in response to a particular stimulus.
Data assimilation (44) & model identification (45) Data assimilation is a term that was coined by meteorologists in the mid-20 th century. It uses (and combines) a wide range of mathematical methods such as nonparametric statistical models (auto regressive models), nonlinear Kalman filters, statistical interpolation, nonlinear time series analysis and nonlinear system identification to establish models useful for weather prediction (46). Model identification is more a general terms that accounts for constructing models describing a phenomenon, which includes model construction and parameter estimation.

Electroencephalography (EEG) and
M/EEG are non-invasive neuroimaging techniques that capture dynamics of neuronal activity with millisecond temporal resolution (on the same order as the temporal dynamics of synaptic activity). EEG captures ensemble neuronal membrane potential Magnetoencephalography (MEG) voltages using grids of electrodes placed on the scalp, whereas MEG measures accompanying fluctuations in magnetic fields that can be captured using arrays of magnetometers known as superconducting quantum interference devices. The MEG signal is subject to less distortion by the skull and scalp than EEG.
Functional magnetic resonance imaging (fMRI) fMRI is a non-invasive neuroimaging technique that measures changes of blood flow and oxygenation with a fine spatial resolution (up to 0.5 ) (effectively) due to neuronal activation and the neurons' subsequent consumption of oxygen. fMRI measures the blood-oxygen-level dependent (BOLD) response to brain activity. Changes in the measurement at each location (voxel) forms a time series, which is analysed using the analytic techniques reviewed here.

Haemodynamic response
This describes the process by which neuronal activation causes changes in blood flow, blood vessel volume and the dynamics of deoxyhaemoglobin. It can be modelled by a dynamical system known as the extended Balloon model (27).
Inhibitory, excitatory and pyramidal neuronal population (25) Intuitively, postsynaptic potentials generated by inhibitory interneurons reduce the depolarisation and subsequent activity of target populations. Conversely excitatory interneurons increase the activity of target populations. Pyramidal cells are excitatory and can be found in superficial and deep layers of the cortical column. Crucially, they project beyond the cortical column in which they are found.

Kullback-Leibler (KL) divergence
This is a measure of differences between two distributions. However it is not a metric since it is not commutative (43). In the context of model estimation in this paper, it is used to score the complexity of the model (the difference between posterior and prior densities over model parameters).

Log likelihood
In the context of parameter estimation in this paper, it is the accuracy of a model in terms of the probability of producing the observed data features. It is the probability of the observed data given the model under a particular set of parameters (19,24). The integrated or marginal likelihood is the probability of the data having marginalised over unknown parameters, which is also called model evidence.
Neurovascular coupling Neurovascular coupling refers to physiological pathways that enable communication between neurons and blood vessels (34).

Statistical Parametric Mapping (SPM) (4)
SPM is freely available software for analysing brain imaging data such as fMRI, MEG and EEG. It includes statistical routines (e.g., general linear model, random field theory, variational Bayes, voxel based morphometry, statistical hypothesis testing, statistical signal processing, to name but a few). SPM also refers to a method for producing statistical maps of parametric statistics testing for distributed neural responses over the brain. The SPM software package also includes the dynamic causal modelling (DCM) toolbox, which enables the modelling of the underlying biological generators of neuroimaging data.
Tracking based parameter identification in dynamical system (47) Constant parameters in a dynamical system have zero temporal evolution. Therefore, they can be considered as state variables with trivial dynamics. The trivial dynamics of parameters can be absorbed into a state space model by augmenting the original equations of the system (thereby named an augmented dynamical system). A filtering method (e.g., nonlinear Kalman filter) can then be applied to reconstruct (estimate) the dynamics the augmented system that includes slowly fluctuating parameters. The mean and variance of the resulting parameter estimates can be taken as a posterior estimates of constant parameters of the original dynamical system.

Structure Learning in Coupled Dynamical Systems and Dynamic Causal Modelling Jafarian et al.
This brief note explains the parameter estimation routine in DCM. Detailed explanation can be found in (1,2). A generic form of a nonlinear model with unknown parameter , is as follows: where is additive error term with Gaussian distribution ~(0, ), with covariance matrix . The inverse of the error covariance matrix is the precision matrix Π , which is decomposed into known precision basis functions as follows: Where scalar is called a hyperparameter (since it defines a probability density function of the error term) and is the respectively. An iterative scheme is used that searches for settings of the parameters that maximise the log model evidence ln ( ). In practice this quantity cannot be computed exactly, and an approximation called the negative variational free energy functional (or evidence lower bound) is returned by DCM. This is also used as the basis for comparing the evidence The left hand side of this equation -the log model evidence log ( )is a fixed value. This means that maximising the free energy will minimize the difference between the approximate and true parameter densities.
Estimation of the densities in DCM rest upon a mean field approximation, that is, densities over the parameters and hyperparameters are independent and therefore can be factorised. In effect, posterior parameters ( , ) can be factorised as follows: