Spiking neural networks for nonlinear regression

Spiking neural networks (SNN), also often referred to as the third generation of neural networks, carry the potential for a massive reduction in memory and energy consumption over traditional, second-generation neural networks. Inspired by the undisputed efficiency of the human brain, they introduce temporal and neuronal sparsity, which can be exploited by next-generation neuromorphic hardware. Energy efficiency plays a crucial role in many engineering applications, for instance, in structural health monitoring. Machine learning in engineering contexts, especially in data-driven mechanics, focuses on regression. While regression with SNN has already been discussed in a variety of publications, in this contribution, we provide a novel formulation for its accuracy and energy efficiency. In particular, a network topology for decoding binary spike trains to real numbers is introduced, using the membrane potential of spiking neurons. Several different spiking neural architectures, ranging from simple spiking feed-forward to complex spiking long short-term memory neural networks, are derived. Since the proposed architectures do not contain any dense layers, they exploit the full potential of SNN in terms of energy efficiency. At the same time, the accuracy of the proposed SNN architectures is demonstrated by numerical examples, namely different material models. Linear and nonlinear, as well as history-dependent material models, are examined. While this contribution focuses on mechanical examples, the interested reader may regress any custom function by adapting the published source code.


Spiking Neural Networks for Nonlinear Regression
Alexander Henkes, Jason K. Eshraghian, Member, IEEE, Henning Wessels Abstract-Spiking neural networks, also often referred to as the third generation of neural networks, carry the potential for a massive reduction in memory and energy consumption over traditional, second-generation neural networks.Inspired by the undisputed efficiency of the human brain, they introduce temporal and neuronal sparsity, which can be exploited by nextgeneration neuromorphic hardware.To broaden the pathway toward engineering applications, where regression tasks are omnipresent, we introduce this exciting technology in the context of continuum mechanics.However, the nature of spiking neural networks poses a challenge for regression problems, which frequently arise in the modeling of engineering sciences.To overcome this problem, a framework for regression using spiking neural networks is proposed.In particular, a network topology for decoding binary spike trains to real numbers is introduced, utilizing the membrane potential of spiking neurons.Several different spiking neural architectures, ranging from simple spiking feed-forward to complex spiking long short-term memory neural networks, are derived.Numerical experiments directed towards regression of linear and nonlinear, history-dependent material models are carried out.As SNNs exhibit memory-dependent dynamics, they are a natural fit for modelling history-dependent materials which are prevalent through all of engineering sciences.For example, we show that SNNs can accurately model materials that are stressed beyond reversibility, which is a challenging type of non-linearity.A direct comparison with counterparts of traditional neural networks shows that the proposed framework is much more efficient while retaining precision and generalizability.All code has been made publicly available in the interest of reproducibility and to promote continued enhancement in this new domain.
Index Terms-artificial neural networks, spiking neural networks, regression, continuum mechanics, neuromorphic hardware
Despite the success of ANNs, several problems arise alongside their utilization, such as the need for high-frequency memory access, which leads to high computational power demand [47], [48].This results in huge costs for training and Manuscript received October 26, 2022 often makes it preferable to run inference in remote servers during deployment.In general, ANNs are most often trained on GPUs, whose energy consumption is problematic in embedded systems (e.g., sensor devices) as is required in automotive and aerospace applications [49].Furthermore, high latency during prediction time can arise where acceleration or parallelization is not available.
Originally motivated by the human brain, today's traditional ANN architectures are an oversimplification of biology, relying on dense matrix multiplication.From a numerical and computational hardware point of view, dense matrix multiplication is often suboptimal.Sparsity is thought to be favorable as it reduces dependence on memory access and data communication [50].In contrast, the human brain is much more efficient, where neurons are considered to be sparsely activated [51].This stems from the fact that the brain uses sparse electronic signals for information transmission instead of dense activations.This leads to remarkable capabilities by using only about 10-20 watts of energy.One attempt to overcome these drawbacks of ANNs is to introduce the information transmission mechanism of biological neurons into network architectures.These networks are called spiking neural networks (SNN) due to the electronic impulses or spikes used for communication between neurons [52].This leads to sparse activations, which can be efficiently exploited by neuromorphic hardware, such as Loihi [53], SpinNaker [54], and TrueNorth [55].It has been shown that these specialized hardware chips are able to reduce the energy consumption of neural networkbased processes by factors of up to ×1000 [53], [56]- [59].
What was classically in the domain of neuroscientists recently has been investigated in the context of deep learning, e.g., the adoption of SNNs to supervised learning as popularised with traditional ANNs in frameworks such as Tensor-Flow [60] and PyTorch [61], resulting in similar frameworks for spiking deep learning like snnTorch [62].Some applications of spiking deep learning includes image processing using a spiking ResNet [63] and temporal data processing using spiking LSTM variants [64], [65].A combination of spiking convolutional neural networks and LSTMs was proposed in [66].SNNs have been used for image segmentation [67] and localization [68], [69].
To the best of the author's knowledge, the scope of regression modeling using SNNs remains limited.In [70], an architecture using inter-spike interval temporal encoding has been proposed, where learned functions were limited to piecewise constant functions.In [71], a SNN was used for the regression of angular velocities of a rotating event camera.Building on these results, [72] proposed a SNN for depth reconstruction.In [73], a DeepONet [74] using SNN was proposed, which used a floating point decoding scheme to regress on simple onedimensional functions.In [75], gradient descent was applied to learn spike times, and in [76] classification problems were recast as regression tasks in the context of memristor based hardware.The focus of the present work lays on neuromorphic hardware, which is specifically designed for SNNs.
As regression problems are omnipresent in engineering sciences, a flexible and broadly applicable framework would enable SNNs to be utilized in a variety of engineering applications and further unfold the potential of neuromorphic hardware.To this end, the present study aims towards the following key contributions: The present work intends to introduce this important novel technique to the community of computational mechanics and applied mathematics.To concentrate on the novelties and keep the presentation concise, we restrict ourselves to onedimensional, history-dependent regression problems.However, the framework is not restricted to single-variable regression and is easily applicable to a multivariable regression.Furthermore, we explicitly do not consider advanced modeling concepts that ensure the thermodynamical consistency of the material models at hand.Nevertheless, our framework can be easily extended towards these important constraints by utilizing works from, e.g., [77]- [79].The latter are translatable from classical ANNs to SNNs.The remainder of this paper is structured as follows.In Section II, the basic notations of SNNs are derived from traditional ANNs.A simple spiking counterpart to the classical densely connected feed-forward neural network is introduced.After that, our regression SNN topology is proposed.First applications toward linear elasticity point out the problems arising in SNN regression.This basic architecture is extended towards recurrent feedback loops in Section III.The ability of these recurrent SNNs is showcased on a nonlinear material model.To process history-dependent regression tasks with dependencies over a large number of time steps, a spiking LSTM is introduced in Section IV.An application to a historydependent plasticity model shows that SNNs can achieve similar accuracies as their traditional counterparts while being much more efficient.The paper closes with a conclusion and an outlook toward future research directions in Section V.For the code accompanying this manuscript, see the data availability section at the end of this manuscript.

I I . S N N F O R R E G R E S S I O N
SNNs are considered to be the third generation of neural networks.While the first generation was restricted to shallow networks, the second generation is characterized by deep architectures.A broad use of 2 nd generation neural networks has been enabled by the availability of automatic differentiation and software frameworks such as Tensorflow [60].To introduce spiking neural networks, we compare them with their wellknown 2 nd generation counterparts.Our notation follows [62].Standard works in theoretical neuroscience include [80], [81] and [82].Several overviews of SNNs with respect to deep learning can be found in [83]- [85].First, the standard feedforward densely connected ANN is introduced.After that, a basic SNN is derived from this.
An ANN is a parametrized, nonlinear function composition.The universal function approximation theorem [1] states that arbitrary Borel measurable functions can be approximated with ANNs.There are several different architectures for ANNs, e.g., feed-forward, recurrent, or convolutional networks, which can be found in standard references such as [86]- [90].Following [91], most ANN formulations can be unified.An ANN N , more precisely, a densely connected feed-forward neural network, is a function from an input space R dx to an output space R dy , defined by a composition of nonlinear functions h (l) , such that N : R dx → R dy (1) Here, x denotes an input vector of dimension d x and y an output vector of dimension d y .The nonlinear functions h (l) are called layers and define an l-fold composition, mapping input vectors to output vectors.Consequently, the first layer h (0) is defined as the input layer and the last layer h (n L ) as the output layer, such that The layers h (l) between the input and output layer, called hidden layers, are defined as 1) , where h η is the η-th neural unit of the l-th layer h (l) , n u denotes the total number of neural units per layer, W (l) η is the weight vector of the η-th neural unit in the l-th layer h (l) and h (l−1) is the output of the preceding layer, where bias terms are absorbed [88].Furthermore, φ (l) : R → R is a nonlinear activation function.All weight vectors W (l) η of all layers h (l) can be gathered in a single expression, such that where θ inherits all parameters of the ANN N (x) from Eq. (1).Consequently, the notation N (x; θ) emphasizes the dependency of the outcome of an ANN on the input on the one hand and the current realization of the weights on the other hand.The specific combination of layers h (l) from Eq. ( 3), neural units h η and activation functions φ (l) from Eq. ( 3) is called topology of the ANN N (x; θ).The weights θ from Eq. ( 4) are typically found by gradient-based optimization with respect to a task-specific loss function [87].An illustration of a densely connected feed-forward ANN is shown in Figure 1.It can be seen that the ANN described in Eq. ( 1) takes an input x and produces an output y, one at a time.If historydependent input and output data x t ∈ R dt×dx and y t ∈ R dt×dy is considered, the formulation of the hidden layers reads where the time component is discrete.This can be understood as processing each discrete-time slice of the input vector of the preceding layer h sequentially, where the weights W (l) η are shared over all time steps.At this stage, the formulation in Eq. ( 5) is purely notationally, as there is no connection of the weights through different time steps.Now, a SNN can be seen as a history-dependent ANN, which introduces memory effects by means of biologically inspired processes.To this end, the activation function φ (l) in Eq. ( 5) can be formulated as with where η,t is the membrane potential of the η-th neural unit at time t, U is the standard ANN weight multiplied with the preceding layer of the current time step, respectively, see Eq. ( 5).Basically, the SNN activation restricts the neural unit to output discrete pulses (φ spk = 1) if the membrane threshold is reached by the timeevolving membrane potential, or to remain silent (φ spk = 0).These pulses are called spikes.The last summand in Eq. ( 6), −φ thr,η , is called the reset mechanism and resets the membrane potential by the threshold potential once a spike is emitted.The membrane threshold and membrane potential decay rate can be optimized during training, such that the optimization parameters of a SNN are The SNN formulation in Eq. ( 7) is called the leaky integrate and fire (LIF) neuron model, and is one of the most widely used models in spike-based deep learning.It can be seen as the baseline SNN and plays a similar role as densely connected feed-forward ANN in classical deep learning.
The formulation in Eq. ( 7) can be seen as the explicit forward Euler solution of an ordinary differential equation, describing the time variation of the membrane potential, see [62] for details.spk trigger changes in the membrane potential U 1 , which when sufficiently excited beyond a threshold U thr causes the neuron to emit an output spike φ j,1 spk .
The main difference between SNNs and classical ANNs lies in the way information is processed and propagated through the network from neuron to neuron.In standard ANNs, inputs, hidden layers, and output vectors are handled via dense matrices.In spiking neural networks, sparsity is introduced by utilizing spikes, which are single events expressed via a Dirac delta function or a discrete pulse in continuous or discrete settings, respectively.A group of spikes over time is called a spike train i = [i t , t = 0, ..., n t ].To this end, a spiking neuron is subjected to a spike train over a time interval, consisting of spikes (1) or zero input (0).The membrane potential U (l) η,t is modulated with incoming spikes i t .In the absence of input spikes, the membrane voltage decays over time due to the membrane decay rate β (l) η .The absence of spikes introduces sparsity because in every time step, the neural unit output is constrained to either zero or one.This fact can be exploited on neuromorphic hardware, where memory and synaptic weights need only be accessed if a spike is apparent.Otherwise, no information is transmitted.In contrast, conventional ANNs do not leverage sparsely activated neurons, and most deep learning accelerators, such as GPUs or TPUs, are correspondingly not optimized for it.
Unfortunately, the spiking activation φ (l) spk,t in Eq. ( 6) is non-differentiable.To use the backpropagation algorithm from standard ANNs, the activation is replaced using a surrogate gradient during the backward pass.Several different formulations have been proposed, see, e.g., [50], [92], [93].In this work, the arcus tangent surrogate activation from [94] is used: for some input x.The surrogate φ surr (x) is continuously differentiable and preserves the gradient dynamics of the network.Thus, for training using backpropagation and its variants, φ surr is employed.Illustrations can be found in Figure 3 and Figure 4.

A. Network topology
The key question for using SNN in regression is how to transform real input values into binary spikes and binary spike information at the output layer back to real numbers.The former task is called spike encoding, whereas the latter is called spike decoding.In this work, a constant current injection is chosen for the encoding part, whereas a novel population voting on membrane potential approach is chosen for the decoding part.Other forms of encoding include rate encoding, latency encoding and delta modulation, among others.Similarly, different decoding strategies exist, such as rate decoding and latency decoding.An illustration of various encoding and decoding strategies is shown in Figure 5. See [62] for an overview and detailed description.All network topologies used in the upcoming numerical examples follow a general scheme, which is flexible and suited for regression tasks.First, the real input x t is provided as a t , for all time steps t, such that h Then, several SNN layers h t follow, where the exact formulation is arbitrary, and will be given for every numerical example.The output of the last spiking layer h (n L ) t is transformed into a decoding layer h dec t , which takes the membrane potential of every time step as input and outputs real numbers which is essentially the formulation of Eq. ( 7), where no spikes and reset mechanisms are used.The transformed values are then transferred to the 'population voting layer', where the output of all neurons of the decoding layer are averaged to give real numbers.This results in where n o denotes the number of neurons in the population voting layer and again, no spikes or reset mechanisms are used.
The final spiking regression topology network S can be written as S : R dt×dx → R dt×dy (13) To summarize, information flows in the form of constant current (real numbers) into the input layer h const t , is then transformed into binary spikes in the spiking layers h (l) t and transformed back into real numbers in the translation layer h dec t .The output is formed in the population layer h pop t .A graphical interpretation is given in Figure 6.
Fig. 6.Topology of the spiking regression network introduced in Section II-A.
For all the following numerical examples, the AdamW optimizer from [95] is used.The parameter are set as follows: learning rate α = 1 × 10 −3 , exponential decay rates for the first and second moment estimates β 1 = 0.9 and β 2 = 0.999, respectively, weight decay λ = 0.01.The training was carried out on a Nvidia GeForce RTX 3090 GPU using snnTorch [62] and PyTorch [61].In this work, the mean relative error E is used, which is defined as for some input • and baseline •.If the error is reported for all time steps, • is a vector containing the values of all time steps.If the error is reported for the last time step, • is equal to the last component of the corresponding history-dependent vector.

B. Numerical experiment: Linear elasticity
The first study investigates the effect of the number of time steps on the performance of the proposed LIF topology in a simple linear regression problem.To this end, the general model described in Section II-A with LIF defined in Eq. ( 7) is used, resulting in the following network topology To begin with, a simple linear elastic material model with strains in the range of ε = [0, 0.001] and fixed Young's modulus E = 2.1 × 10 5 MPa is considered, such that the resulting stress σ is The training data consists of strain input, uniformly sampled in the interval ε = [0, 0.001], and stress output calculated according to Eq. ( 16).Three datasets are generated, namely a training set, a validation and a test set consisting each of n train = n val = n test = 1024 samples.All three datasets are standardized using the mean and standard deviation from the training set.The batch size is chosen as n batch = 1024.The number of neurons n u is chosen as n u = 128 and is kept constant over all layers.The training is carried out for 2 × 10 3 epochs.The model performing best on the validation set is chosen for subsequent evaluations.The mean relative error accumulated over all time steps and the mean relative error of the last time step with respect to the test set are reported.
The results are illustrated in Figure 7.It can be seen, that the mean relative error is lowest for d t = 5 time steps.For d t = 2, the error is larger.This could be caused by a lack of a sufficient number of time steps for the neuron dynamics to effectively be calculated.It can be understood as a failure due to too large time steps in the explicit stepping scheme in Eq. (7).Clearly, the highest error can be observed for d t = 100 time steps.In contrast, as depicted in Figure 8, the error at the last time step is lowest for d t = 100 time steps.To illustrate the cause, the prediction of the network for two different samples, one for d t = 5 and one for d t = 100 time steps are shown in Figure 9 and Figure 10, respectively.While good agreement on the endpoints is apparent, fluctuation during the rest of the time steps causes the rise in the error.Seemingly, the LIF has difficulties regressing a large number of time steps.This could be caused by the lack of recurrent connections in the LIF formulation from Eq. ( 15), where history dependency is only weakly included in the form of the membrane potential.To counter this problem, recurrent LIFs will be introduced in Section III.

Remark:
The seemingly simple linear regression task provides a challenge for SNN, as effectively an ordinary differential equation has to be fitted to a linear function while relying on binary information transmission and inexact gradients.

I I I . N O N L I N E A R R E G R E S S I O N U S I N G R L I F
In order to counter the problems of vanishing information for a large number of time steps encountered in the preceding section, a recurrent SNN architecture is proposed (Section III-A).Its performance is demonstrated by means of a numerical example in Section III-B.

A. Recurrent Leaky Integrate and Fire (RLIF)
The standard LIF is a feed-forward neuron, such that information is flowing unidirectionally in the form of spikes.By adding a feedback loop, a recurrent LIF (RLIF) can be formulated, which builds on the standard recurrent neural network (RNN) formulation.This enables the network to use relationships along several time steps for the prediction of the current time step.It was shown in [96], that recurrent loops can retain information for a relatively longer number of time steps when compared to their non-recurrent counterparts.
Here, the formulation of the hidden layer in Eq. ( 5) includes additional recurrent weights In this RNN, the influence of the preceding time step is explicitly included by means of additional recurrent weights V (l) η .The resulting set of trainable parameters reads The RNN formulation can be included in the LIF formulation from Eq. ( 7) to obtain an RLIF, such that η,t is again the membrane potential of the η-th neural unit at time t, U is the standard ANN weight multiplied with the preceding layer at the current time step, respectively.Additionally, V (l) η denotes the recurrent weights from Eq. ( 17).This leads to the following set of trainable parameters thr,η .

B. Numerical experiment: Ramberg-Osgood
The performance of the RLIF is investigated toward nonlinear function regression.The well-known nonlinear Ramberg-Osgood power law for modeling history-independent plasticity is chosen.The formulation of the stress σ with respect to the strain ε reads where ε is the infinitesimal, one-dimensional elastic strain, σ denotes the one-dimensional Cauchy stress, E is Young's modulus, α and n are constants describing the hardening behavior of plastic deformation and σ Y is the yield strength of the material.In Figure 11, different stress-strain curves are depicted for different yield strength values, obtained with a classical Newton-Raphson method.Note that this plasticity model is only suited for a single loading direction and does not incorporate accumulation of plastic strain.It is only used as a prototypical nonlinear model to show the ability of the RLIF to regress over a moderate number of time steps.
To this end, the general model described in Section II-A using RLIF defined in Eq. ( 19) is used, resulting in the following architecture where the yield strength σ Y is provided as a constant current, that is a constant spike train, for each time step d t .The training data consists of yield strength σ Y as input for fixed strains in the interval ε = [0, 0.01] for d t = 20 time steps.The yield strength is uniformly sampled in the interval σ Y = [100, 500] MPa, and the stress output is calculated according to Eq. ( 21).The Young's modulus is chosen as E = 2.1 × 10 5 MPa and n = 10.Three datasets are generated, namely a training set, a validation and test set with n train = n val = n test = 1024 samples, respectively.All three sets are standardized using the mean and standard deviation from the training set.The batch is chosen as n batch = 1024.The number of neurons n u is chosen as n u = 128 and is kept constant over all layers.The training is carried out for 5 × 10 3 epochs.The model performing best on the validation set is chosen for subsequent evaluations.The mean relative error and the mean relative error of the last time step with respect to the test set are reported.
The results of five different samples, randomly chosen from the n test = 1024 test samples, can be seen in Figure 13.For the test set, a mean relative error for all time steps of 8.7934×10 −2 and a mean relative error for the last time step of 8.0200 × 10 −2 is obtained.The predictions on these five samples are more accurate than would be suspected from the mean relative error.The cause can be found in Figure 12, where the mean relative error for all time steps is plotted for every sample of the test set.It can be observed, that a small number of samples has a much higher error than the rest, which impacts the error measure.This is caused by the purely data-driven nature of the experiment and can be tackled with approaches introduced in, e.g., [77]- [79].Nevertheless, the RLIF is able to regress on the varying yield strength σ Y and can predict the resulting nonlinear stress-strain behavior, as can be seen in the predictions Figure 13.Deviations can be observed around the yield point as well as the endpoints of the curves.To be able to take into account long-term history dependent behavior, the RLIF formulation will be expanded towards the incorporation of explicit long-term memory in the next section, where a more complex plasticity model is investigated.A. Spiking long short-term memory network (SLSTM) A SLSTM is the spiking version of the standard LSTM [97], where the latter is defined as , where f t denotes the forget gate with sigmoid activation φ sigmoid or tangent hyperbolicus activation φ tanh and corresponding weights f , V f with absorbed biases.The same nomenclature holds for the input gate i t , the output gate o t , the cell input ct and the cell state c t with their respective activations and weights.The new cell state c t and the output of the LSTM h t are formed using the Hadamard or point-wise product .The parameters of the LSTM are its weights, such that For detailed derivations and explanations of standard LSTM, see, e.g., [87], [88].The SLSTM can be obtained from the LSTM by using spike activations within the LSTM formulation from Eq. ( 23), such that where the output h In other words, the output of h η,t can be interpreted as the membrane potential of the SLSTM, such that h A decay parameter β is not used in this formulation.Rather than using decay to remove information from the cell state c (l) η,t , this is achieved by carefully regulated gates.
The corresponding optimization parameters of the SLSTM are thr,η .
Basically, the cell state c (l) η,t acts as long-term memory, just like in the standard LSTM formulation.The communication between layers is handled via spike trains that depend on the membrane potential h η,t in Eq. ( 25) and the activation function φ (l) spk,t from Eq. ( 26).

B. Numerical experiment: Isotropic hardening using SLSTM
The following numerical experiments aim to investigate the performance of the proposed SLSTM on nonlinear, historydependent problems.Therefore, a one-dimensional plasticity model with isotropic hardening is investigated.Following [98], the model is defined by where 1) is the additive elasto-plastic split of the small-strain tensor ε into a purely elastic part ε el and a purely plastic part ε el .2) denotes the elastic stress-strain relationship for the Cauchy stress tensor σ and elastic modulus E. 3) describes the flow rule and isotropic hardening law with consistency parameter γ and equivalent plastic strain α. 4) gives the yield condition f (σ, α) with hardening modulus K. 5) denotes the Kuhn-Tucker complementarity conditions and 6) describes the consistency condition.In Figure 14, different stress-strain paths are shown for varying strains.Especially long-time dependencies are of interest.To this end, the predictive capabilities of the SNN are investigated for inference over d t = 100 time steps, where the elasto-plastic model is evaluated using a classical explicit return-mapping algorithm, see [98].The training data consists of strain as input, uniformly sampled in the interval ε = [0, 0.01], and stress as output calculated according to Eq. ( 28).The yield stress is chosen as σ Y = 300 MPa, the elastic modulus E = 2.1 × 10 5 MPa and the hardening modulus as 2.1 × 10 4 MPa.Three datasets are generated, namely a training set with n train = 10240 samples and a validation and test set with n val = n test = 1024 samples, respectively.All three sets are standardized using the mean and standard deviation from the training set.The batch size is chosen as n batch = 1024.The training is carried out for 500 epochs.The model performing best on the validation set is chosen for subsequent evaluations.The mean relative error accumulated over all time steps and the mean relative error of the last time step with respect to the test set are reported.The last time step is of special importance, as in the case of numerical simulations, only the resulting stress from the last time step is used for subsequent calculations.
The first study investigates the prediction accuracy as a function of (1) the number of output neurons, which participate in the population regression outlined in Section II-A and (2) different capacities of the SLSTM in the sense of layer width.To this end, the SLSTM defined in Eq. ( 26) is used, resulting in the following architecture: Multiple simulations with output neurons and hidden layers drawn from the grid n u × n o = [16,32,64,128,256] × [16,32,64,128,256] are carried out.The resulting mean relative error for all time steps with respect to the test set is shown in Figure 15, whereas the resulting mean relative error of the last time step with respect to the test set is depicted in Figure 16.A clear convergence behavior can be observed for the number of hidden neurons n u , where larger numbers of neurons lead to lower errors.For the number of output neurons n o , a tendency can be observed upon convergence with respect to n u .For the largest number of hidden neurons n u = 256, the mean relative error over all time steps and the mean relative error of the last time step get larger for n o = [128, 256] output neurons, whereas for n o = [16,32,64] the errors are almost the same.The lowest mean relative error for all time steps is 5.2445 × 10 −2 for n u = 256 hidden neurons per layer and n o = 64 output neurons.The lowest mean relative error for the last time steps is 2.8729 × 10 −3 n u = 256 hidden neurons per layer and n o = 32 output neurons.Again, the seemingly high errors are caused by outliers polluting the average, as described in Section III-B.The same counter-measures can be applied to prohibit outliers, e.g., by enforcing thermodynamic consistency.
For the second experiment, the SLSTM using n o = 64 output neurons and n u = 256 hidden neurons per layer are compared to a standard LSTM with an equal number of optimization parameters.The aim of this study is the comparison of the prediction accuracy, but also the difference in memory and energy consumption on neuromorphic hardware.For both ANN variants to be comparable, the same topology is chosen for the LSTM as for the SLSTM, such that  where the last two layers are replaced by densely connected conventional feed-forward neural networks.Again, the training was carried out for 5 × 10 3 epochs and the same datasets from the previous experiments are used.The standard LSTM from Eq. (30) reached a mean relative error of 4.8611 × 10 −2 over all time steps and a mean relative error of 4.7569 × 10 −3 for the last time step.The SLSTM from Eq. ( 29) reached a mean relative error of 9.3832 × 10 −2 over all time steps and a mean relative error of 4.0497 × 10 −3 for the last time step.The resulting prediction for one strain path is illustrated in Figure 17.Clearly, both networks are able to accurately predict the history-dependent, nonlinear stress-strain behavior.
Some deviations from the SLSTM can be seen in the beginning of the curve.The dynamics of the spiking formulation results in a higher mean relative error over all time steps with respect to the LSTM.However, the endpoint has a better fit than the LSTM.This is seen in the lower error at the last time step.Whether this is just an effect due to our experimental setting or a general feature of the method has to be investigated in a larger statistical analysis in upcoming studies.
To assess the potential of interfacing our model in embedded, resource-constrained sensors in the wild, we performed a series of power profiling experiments for our SNNs (both using LIF neurons and SLSTMs) when processed on the Loihi neuromorphic chip [53].These results are compared against their non-spiking equivalents on an NVIDIA V100 GPU.Data were extracted using the energy profiler in KerasSpiking v0.3.0.
The first difference in energy usage is that the spiking implementation is measured in an 'event-based' manner, where processing only occurs when a neuron emits a spike.In contrast, a non-spiking network processed on a GPU continuously computes with all activations.Note that the cost of overhead did not need to be accounted for (i.e., transferring data between devices) because all models fit on a single device.The second difference is that SNNs require multiple time steps of a forward pass, whereas their non-spiking counterparts do not (unless the input to the network varies over time).
Each network has been broken up into its constituent layers to measure how much they contribute to energy usage on each device.The total energy consumption per forward pass of the non-spiking network on the V100 is 512 nJ, whereas the equivalent SNN is 4.25 nJ.This represents a 120x reduction in energy consumption.The non-spiking LSTM network consumed 5.7 µJ while the proposed spiking-LSTM architecture required 24 nJ, a 238× reduction.Detailed results are summarized in Table I.In the present study a framework for regression using SNNs was proposed based on a membrane potential spiking decoder and a population voting layer.Several numerical examples using different spiking neural architectures investigated the performance of the introduced topology towards linear, nonlinear, and history-dependent regression problems.First, a simple feed-forward SNN, the LIF, was derived from the classical densely connected feed-forward ANN.It was shown, that the SNN can be seen as a special kind of activation function, which produces binary outputs, so-called spikes.These spikes are used to propagate information through a possibly deep spiking neural network.The spikes occur due to the dynamic behavior of the membrane potential inside the neuron, which rises when spikes appear at the input and decays over time if no spikes appear.If a certain threshold value is reached, the membrane potential is reset and the neuron emits a spike itself.This formulation introduces more hyperparameters, which fortunately can be learned during training.The spikes introduce sparsity in the network, which can be effectively exploited by neuromorphic hardware to improve latency, power, and memory efficiency.The non-differentiability of the binary spikes is circumvented by surrogate gradients during backpropagation.
Next, a network topology was proposed, which decodes binary spikes into real numbers, which is essential for all kinds of regression problems.A decoding layer takes the membrane potentials of all neurons in the last spiking layer and propagates them to a population voting layer, which provides its mean potential resulting in a real number.The proposed topology can be used for arbitrary temporal input and output dimensions.A simple experiment on a linear elastic material model using LIFs showed, that the proposed topology is able to regress the problem.It was shown, that errors are introduced for a large number of time steps.This problem was overcome by introducing RLIF, which extends the LIF by recurrent feedback loops.An experiment using a nonlinear Ramberg-Osgood plasticity model showed that the proposed topology using RLIF is able to regress varying yield limits accurately.The final extension was concerned with the introduction of explicit longterm memories inspired by the classical LSTM formulation, resulting in a spiking LSTM.The performance of this SLSTM was investigated on a history-dependent isotropic hardening model, where different load paths were accurately regressed.During prediction, the SLSTM was able to generalize even better than the LSTM for the final load step.Furthermore, the convergence of the proposed method was shown.
Power profiling and memory analysis were conducted on the LIF and SLSTM networks to compare efficiency on neuromorphic hardware as against a GPU.The Loihi neuromorphic processor was able to achieve a 120× reduction in energy consumption when processing the dense LIF network, and the SLSTM offered a 238× reduction in energy during inference.
The range of possible future application scenarios enabled by regression with Spiking Neural Networks are manifold.For instance, today's sensing systems cannot capture all quantities that are relevant for structural health-monitoring.In the context of mechanics, displacement and strain are quiet easy to assess, but the mechanical stress, which reflects the actual response of structures and materials to deformation, remains a so-called hidden-quantity.Physics-informed machine learning offers the potential to reconstruct hidden quantities from data by leveraging information from physical models, given in the form of partial differential equations.It is expected that the developments in the field of neuromorphic hardware will foster the development of a new generation of embedded systems, which will ultimately enable control of structures and processes based on partial differential equations.

Fig. 2 .
Fig. 2. Spiking Neuron Dynamics.Input spikes φ i,0spk trigger changes in the membrane potential U 1 , which when sufficiently excited beyond a threshold U thr causes the neuron to emit an output spike φ j,1 spk .

Fig. 5 .
Fig. 5.A sample of spike-based encoding and decoding strategies.Left: Realvalued inputs are encoded into spikes by means of different strategies [62], e.g., (1) high intensities or large values result in a large number of spikes (top left), (2) high intensities or large values result in early spike firing (center left), (3) delta modulation where spikes are produced for positive gradients of the input function (bottom left).Right: In classification, the predicted class is determined via (1) the number of spikes (top right) or (2) the first occurence of spikes (bottom right).Regression is introduced in section Section II-B.
t mean rel.error all time steps

Fig. 7 .
Fig.7.Elasticity -error of all timesteps: Mean relative error for all time steps with respect to the different total number of time steps.The error is rising for a larger number of time steps.

Fig. 8 .d t = 5 Fig. 9 .
Fig.8.Elasticity -error at last time step: Mean relative error for the last time step with respect to the different total number of time steps.The error is converging for a larger number of time steps.

Fig. 10 .
Fig.10.Elasticity -prediction in 100 time steps: Prediction of the LIF from Eq. (15) for dt = 100.Fluctuations around the true solution can be observed.

Fig. 11 .
Fig. 11.Ramberg-Osgood -reference solutions.Stress-strain curves of the Ramberg-Osgood material model for five different values of the yield stress σ Y obtained with Newton-Raphson algorithm.

Fig. 12 .Fig. 13 .
Fig.12.Ramberg-Osgood -RLIF test error.The mean relative error over all time steps for 1024 samples from the test set for the numerical experiment described in Section III-B.It can be seen, that some outliers have large error values, resulting in a mean error over all samples of 8.7934 × 10 −2 .Most samples have a significantly lower error.

Fig. 14 .
Fig.14.Isotropic hardening -reference solutions.Five stress-strain curves sampled from the isotropic hardening material model for different maximum strains obtained from Eq. (28).

Fig. 16 .
Fig.16.Isotropic hardening -error versus width.The mean relative error of all time steps versus the number of hidden neurons per layer is shown for different numbers of output neurons in the isotropic hardening experiment from Section IV using the SLSTM from Eq. (29).

Fig. 17 .
Fig.17.Isotropic hardening -LSTM versus SLSTM.Prediction of a single load path using the return-mapping algorithm as a reference, the standard LSTM and the spiking LSTM formulation.

•
Introduction of spiking neural networks: Concise introduction of this emerging technique.Open source benchmark code for the research community.• History dependent regression framework: SNNs are naturally suited for classification.Engineering problems often involve regression tasks.We present a flexible framework to use SNNs in complex regression tasks, namely history-dependent material behavior in the case of isotropic hardening plasticity.As such, we demonstrate that SNNs can model systems that exhibit hysteresis.
• Efficiency, sparsity and latency: We benchmark our SNN on neuromorphic hardware in terms of energy consumption as compared to non-spiking equivalent networks, demonstrating that they are much more efficient with respect to memory and power consumption, making neural networks more sustainable.Their deployment on neuromorphic hardware allows highly efficient usage in embedded environments.We present a detailed comparison with standard ANN for memory consumption and power consumption.
Isotropic hardening -error versus width.The mean relative error of the last time step versus the number of hidden neurons per layer is shown for different numbers of output neurons in the isotropic hardening experiment from Section IV using the SLSTM from Eq. (29).