Energy scaling and reduction in controlling complex networks

Recent works revealed that the energy required to control a complex network depends on the number of driving signals and the energy distribution follows an algebraic scaling law. If one implements control using a small number of drivers, e.g. as determined by the structural controllability theory, there is a high probability that the energy will diverge. We develop a physical theory to explain the scaling behaviour through identification of the fundamental structural elements, the longest control chains (LCCs), that dominate the control energy. Based on the LCCs, we articulate a strategy to drastically reduce the control energy (e.g. in a large number of real-world networks). Owing to their structural nature, the LCCs may shed light on energy issues associated with control of nonlinear dynamical networks.


Introduction
The past 15 years have witnessed tremendous advances in our understanding of complex networked structures in various natural, social and technological systems, as well as the dynamical processes taking place on them [1][2][3][4][5][6]. Progress has also been made in the area of controlling complex networks, where the ultimate goal is to control nonlinear dynamical processes on complex networks. The interplay between nonlinear dynamics and complex networks makes formulating a general control framework too difficult to be addressed at the present. A reasonable compromise is to study linear controllability [7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23] while retaining the complex network topology in hope to gain insights into the fundamental control issues that can be useful for controlling nonlinear dynamical networks. Some representative results are the following. A key issue was to determine the minimum number of driver nodes required to steer the system from an arbitrarily initial state to an arbitrarily final 2016 The Authors. Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.

Control energy distribution
We use the standard setting of network controllability [7,10,16]: where x = [x 1 (t), . . . , x N (t)] T is the state variable of the whole system, vector u = [u 1 (t), . . . , u M (t)] T is the control input or the set of control signals, A is the N × N adjacency matrix of the network, and B is the N × N D control matrix specifying the set of 'driver' nodes [10]. The goal of the structural and exact controllability theories is to determine the minimum number of driver nodes, N D , for networks of various topologies. With the input driving signal u t , the control energy is defined as In the standard linear systems theory [29], optimal control can be achieved to minimize the energy in the functional space of the control input signal u t . The optimal control signal is where W ≡ t f t 0 e Aτ B · B T · e A T τ dτ is the positive-definite and symmetric Gramian matrix. We use the Erdős and Rényi (ER) type of directed random networks [30,31] and the Barabási-Albert (BA) type of directed scale-free networks [2] with a parameter P b . Specifically, for a pair of nodes i and j with a link, the probability that it points from the smaller degree to the larger degree nodes is P b , and 1 − P b is the probability that the link points in the opposite direction (if both nodes have the same degree, the link direction is chosen randomly). We implement the maximum matching algorithm [10] to obtain the control matrix B and calculate the minimum energy for any given control time t f . For an ensemble of randomly realized network configurations with identical structural properties, the control energy E can be regarded as a random variable. We find that, for a vast majority of the networks in the ensemble,   the required control energy is enormous and tends to diverge. For the cases where the energy can be reasonably computed, it follows an algebraic distribution with fat tails, as shown in figure 1 with the scaling exponent of about 1.5, regardless of the network type and size.

Structural determinants of control energy and energy distribution
We develop a physical understanding of the large control energy required and also the algebraic scaling behaviour in the energy distribution. To gain insights, we first consider a simple model: a unidirectional, one-dimensional string network, for which an analytic estimate of the control energy can be obtained [23] as where l is the chain length (the number of nodes on the string) and λ H l is the smallest eigenvalue of the underlying H-matrix, denoted by H l , which is related to the Gramian matrix by H ≡ e −At f We −A T t f . Numerical verification of equation (2.4) is presented in figure 2a. Although equation (2.4) is obtained for a simple one-dimensional chain network, we find numerically and analytically that it also holds for random and scale-free network topologies. A brief derivation of equation (2.4) is as follows. The control energy required by the system can be expressed as [13] E where x 0 is the initial state of the system. The matrix H is positive definite and symmetric with its inverse satisfying H −1 = QΛQ T , where Λ and Q are the corresponding eigenvalue and eigenvector matrices, respectively. Thus, λ −1 H is the largest eigenvalue of H −1 with λ H denoting the smallest eigenvalue of H. Numerically, we find that λ −1 H is dramatically larger than any other eigenvalue. We have where q i is the ith column of Q. If the initial state x 0 is chosen to satisfy q T 1 · x 0 = 1, we obtain E(t f ) ≈ λ −1 H . We find numerically that a random choice of x 0 does not change E(t f )'s order of magnitude. A detailed derivation can be found in [23].
We see from figure 2a, that the energy required for control tends to increase exponentially with the chain length, indicating that even for a simple one-dimensional chain network of limited length, such as l = 7, the required control energy can be unbearably large. The exponential behaviour holds for complex networks as well, as shown in the inset of figure 2b for ER random and BA scale-free networks. In fact, figure 2b indicates a strong positive correlation between the average control energy for the network and E L , the energy required to control the LCC (to be described below) embedded in the network.

Concepts of control signal paths and longest control chains
In a networked system, control signal and energy originated from the driver nodes travel through onedimensional-string-like paths towards each of the non-driver nodes. As discussed, identifying maximum matching so that the network is deemed structurally controllable does not guarantee convergent control energy. When maximum matching is found, one can divide the whole network into N D CSPs, namely, N D stems [10,18], each being a unidirectional one-dimensional string led by a driver node that passes the control signal onto every node along the path, illustrated as the vertical paths in figure 3. CSPs thus provide a picture indicating how the signals from the N D external control inputs reach every node in the network to ensure full control (in the sense of structural controllability).
How does the control energy flow through the network? To address this question, we distinguish two types of links: one along and another between the CSPs, as shown in figure 3. It may seem that the latter links are less important as the control signal propagates along the former set of links. However, owing to coupling, a node's dynamical state will affect all its nearest neighbours' states which, in turn, will affect the states of their neighbours, and so on. In principle, any driver node is connected with nodes both along and outside its CSP. Correspondingly, an arbitrary node in the network is influenced by every driver node, directly through the CSP to which it belongs, or indirectly through the CSPs that it does not sit on. Intuitively, the ability of a driver node to influence a node becomes weaker as the distance between them is increased. In order to control a distant node, exponentially increased energy from the driver is needed. The chain starting from a driver node and ending at a non-driver node along their shortest path is effectively a control chain. Intuitively, control energy flows through the control chains, while the control signal is propagated along the CSPs. We define the length of the longest control chain (LCC), D C , as the control diameter of the network, as shown in figure 3.
To find the LCCs in a network, we first use the maximum matching algorithm to find all the driver nodes. We then identify the shortest paths from each of the driver nodes to each of the non-driver nodes. Finally, we pick out the longest such shortest paths as LCCs. The computational complexity of finding the LCCs are that associated with the maximum matching algorithm plus searching for the longest shortest paths, which is feasible for large networks. There can be multiple LCCs. The node at the end of an LCC is most difficult to be controlled in the sense that the largest amount of control energy is required. The number m of such end nodes dictates the degeneracy (multiplicity) of LCCs. An example is shown in figure 3, where we see that, although there are multiple LCCs, their ends converge to only three nodes, leading to m = 3. Since the energy required to control a one-dimensional chain grows exponentially with its length in such a way that even one unit of increase in the length can amplify the energy by orders of magnitude (figure 2a), the energy associated with any chain shorter than the LCC can typically be several orders of magnitude smaller than that with the LCC. Thus, the total energy is dominated by the LCCs.  In this example, the length of the LCCs is 4. Typically, a control chain may contain nodes belonging to multiple CSPs. Two LCCs sharing no common nodes are marked by the red nodes and the solid red arrows. Links belonging to other LCCs are marked by red dashed arrows. Each node is specified using its CSP label and its position along the CSP sequentially from top to bottom. Eight LCCs in the network converge to only three end-nodes, e6, f 5 and f 6 (marked by red dashed circles), leading to LCC degeneracy m = 3. Note that the stems [10] (or CSPs) are originated from the driver nodes and are obtained through maximal matching.
Owing to the typically low value of m, a single LCC essentially dictates the energy required to control the whole chain system. This is true especially for networks with long LCCs. Intuitively, the probability to form long LCCs is small. Accordingly, a longer LCC tends to have smaller value of degeneracy m. As a result, the longest LCCs have almost no degeneracy (m = 1) so that they effectively rule the control energy of the whole network.
In the structural controllability theory, a network is deemed more structurally controllable if N D is smaller [10]. However, as the number of driver nodes is reduced, the length of the chain of nodes that each controller drives on average must increase, leading to an increase in the LLC length and accordingly an exponential growth in the control energy. In the 'optimal' case of structural controllability of N D = 1, the LLC length will be maximized, leading to unrealistically large control energy that prevents us from achieving actual control of the system.
Taken together, with respect to the previously defined concept of stem [10,18], we emphasize that a stem is a path that propagates control signal from the input in the absence of a feedback loop (or a circle), and each such path is determined by maximum matching, which allows a node to control at most one of its immediate neighbours. Thus, a stem is in fact a CSP, which is quite different from the concept of an LCC. Particularly, for a driver node and a non-driver node, a control chain is the shortest path between them. For a set of driver nodes and all non-driver nodes in the entire network, an LCC is simply the longest control chain. While control signals propagate through the CSPs, the energy may not flow along the same paths due to the interactions among the nearest neighbours via the physical connections. For example, the state change of a node can lead to state changes of all its nearest neighbours through the energy exchange between this node and all its neighbours. That is, control energy flows through the LCCs, not CSPs.

Control energy reduction scheme
Our finding of the LCC skeleton suggests a method to significantly reduce the control energy. A straightforward solution is to break the LCCs with redundant controllers beyond those obtained via maximum matching along the LCCs. Adding a redundant control input at the ith node of a unidirectional one-dimensional chain of length l breaks it into two shorter sub-chains of length i − 1 and l − i + 1. Roughly, the control energy is the sum of energies required to control the two shorter components, which is dominated by energy associated with the longer one owing to the exponential dependence of the energy on the chain length. For i ≈ l/2, the length of the longer part is minimized. In simulations,  . For the light green bars, the reduced control energy E mid is several orders of magnitude smaller than E . The bars with more gray portion than the light green potions are for the networks with relatively low values of E and D C , for which energy reduction is not necessary.
applying a single redundant control input presents an extremely efficient energy reduction strategy for one-dimensional chains: several orders of magnitude of reduction in the energy can be achieved. As a realistic physical example, we considered a bidirectional cascading parallel R-C circuit of seven units, which is effectively a one-dimensional chain of seven nodes with a self-loop at each node. For this system, the redundant control input can be realized by inducing external current input into a capacitor. A reduction in the joule energy of nearly 10 orders of magnitude is achieved [23]. For a complex network, the strategy of adding a redundant control at the middle of each LCC dramatically outperforms the strategy of randomly adding an identical number of control inputs. Treating the amount of the normalized energy reduction E/E as a random variable across the ensemble of networks, we obtain a monotonously increasing distribution function P( E/E), reflecting the effect of energy reduction. We find that high (low) E/E values are more (less) probable under LCC-breaking strategy when compared with the random strategy. For networks with longer LCCs, the strategy works more effectively. For example, for networks with D C = 5, nearly 40% of the networks reach E/E ≈ 1, but the same reduction can be achieved only for 2% of the networks via the random strategy.
For a large number of real-world networks, the mathematical structural controllability theory stipulates that some of them are controllable with a few driving signals [10]. We find that most of these networks (15 out of 17) require unrealistically high energies due mainly to their long LCCs. Strikingly, even with unlimited energy supply, the number of driver nodes as determined by the maximum matching algorithm is generally insufficient to fully control the whole system, where there exists a number M of nodes that never converge to their target states. These observations lead to the speculation that, in order to fully control a realistic network, significantly more driver nodes are needed than those identified by maximum matching. For example, N D = N D + M driver nodes should be deployed to gain full control of the system, as shown in figure 4a for n D and n D = N D /N for the 17 real-world networks. Our energy reduction strategy is remarkably effective on the real-world networks with large LCCs, as shown in figure 4b. The quantity E mid is the control energy of a real-world network when a redundant control input is applied at the middle of each LCC, while E R-mid denotes the energy when the same number of redundant control inputs are randomly applied to the network. For each of the realworld networks with unrealistically large energy requirement, the reduced control energy E mid is several orders of magnitude lower than the original energy E -the control energy with M augmented driver nodes without any LCC-breaking control input. The energy reduction due to random control signal augmentation is again much less effective in most cases, giving further evidence that the control energy of real-world networks is generally determined by their LCCs.
To the best of our knowledge, the failure to converge to the target state with limitless energy supply has not been observed for any modelled networks: it only occurs in empirical (real-world) networks. It may be caused by some atypical structural properties. For example, for some networks, the uncontrollable nodes form self-loops but without any incoming links. For some other networks with an unusually high average degree, an ill-conditioned Gramian matrix can arise, prohibiting the system from converging to the target state. However, such uncontrollable cases (even with infinite supply of energy) are not expected to be generic.

Discussion
We fill a significant knowledge gap in the extremely active field of controlling complex networks by developing a physical understanding of the recently discovered phenomena of algebraic distribution and divergence of the required control energy. This is achieved through identification of LCCs, the fundamental structure embedded in the network that dominantly determines the energy. The understanding leads naturally to an optimization strategy to significantly reduce the control energy in real-world networks: increasing the number of controllers slightly by placing redundant control signals (beyond the number determined by the structural-controllability theory) along the LCCs. At present, the issue of control energy can be addressed only in the linear controllability framework, but we believe that the LCCs, due to their structural origin, can find counterparts in formulating theories of controllability and observability for nonlinear dynamical networks [9,21,32,33], a largely open problem deserving much attention and great research effort.
Our results that the LCCs essentially determine the network control energy are consistent with the finding in [34]. In particular, for a one-dimensional unidirectional chain with exactly one control input, its control energy has the standard definition that leads to the mathematical conclusion of energy divergence for long chains, while the finding that anisotropic networks can be readily controlled was not obtained under exactly the same setting. The dependence of a network's control energy on its LCCs can be further demonstrated via our energy reduction schemes, effective for both modelled and real-world complex networks.

Longest control chain and control energy distribution
The construction illustrated in figure 3 provides a structural profile to estimate the control energy. In particular, a network can be viewed as consisting of a set of control chains, and the total energy E required has two components: E (1) , the sum of energies associated with all control chains, and E (2) , the interaction energies among the chains, defined as E − E (1) . The control chains are approximately independent of each other so that E (2) is negligible as compared with E (1) . Each control chain is effectively a one-dimensional string. Owing to the exponential increase in the energy with the chain length, E (1) can be regarded as the sum of control energies associated with the set of LCCs. The required energy to control the full network can thus be approximated as where E D C denotes the energy required to control an LCC of length D C , and m denotes its degeneracy (multiplicity), as shown in figure 3. Reasoning from an alternative standpoint, an arbitrary combination of D C and m, which we call an LCC skeleton, effectively represents a network, and the entire network ensemble is equivalent to all the possible LCC skeletons according to a joint probability function P(D C , m). The energy distribution for the original network can be characterized by the distribution of the energy required to control the LCC skeleton in the ensemble. Through simulations, we find that the marginal distribution of D C , P D C (D C ), decays approximately exponentially: where a and b are positive constants. Using we obtain where A and B are positive constants, and consequently the distribution of E D C as The marginal distribution of m for networks with D C > 2 also exhibits an exponential decay: where c and g are positive constants. Note that D C and m are uncorrelated, since a positive correlation would imply that the number of LCCs increases with their length, which is unphysical, and a negative correlation would lead to an exponential divergence of either P D C (D C ) or P m (m). We thus have and As a result, we obtain the following cumulative energy distribution: where Γ (b/B) and Γ (b/B, gE) are the Gamma and incomplete Gamma functions, respectively. The distribution of E can then be expressed as where −e −gE /E ≈ 0 due to the typically large value of E. Since numerically the difference between the two Gamma functions is approximately constant: providing a theoretical explanation for the algebraic energy distribution, where is a positive constant. Numerically, we have B ≈ 2 and b ≈ 1. Accordingly, the theoretical estimate of the scaling exponent is 1 + b/B ≈ 1.5, which is consistent with the numerical value. The nearly identical exponent in the distribution of E D C indicates that the role of the LCC degeneracy m is insignificant in the control energy and its distribution. It is the combination of the exponential decay in the distribution of the control diameter and the exponential increase in the energy required to control LCCs with their length which gives rise to the power-law energy distribution of the LCCs and ultimately leads to the algebraic distribution in the actual energy required to control the original network. We remark that, since the probability P m (m) has an exponential dependence on m and does not explicitly contain the energy E D C , the summation in m from 1 to E/E D C is a finite geometric sum, which differs from the integral in only a constant coefficient and does not affect the algebraic scaling exponent 1 + b/B. In our analysis, the reason to treat m as a real valued number is that the relation E ≈ mE D C holds only in the order-of-magnitude sense, implying that E/E D C is typically not an integer. The limits m → 0 and E D C → ∞ correspond to the rare case where all nodes in the network belong to one single LCC. If we treat m (1 ≤ m ≤ E/E D C ) as an integer and accordingly set the integral upper bound of E D C as E, the cumulative energy distribution function will have a similar form to equation (4.9) with only a small difference in the constant coefficient. This would not have any significant effect on the probability density function P E (E) in equation (4.12). Numerically, E is typically a large number in the order of at least 10 10 . As a result, the numerical difference caused by the integral upper bound is negligible.

Double-chain interaction
Our analysis of the LCC-skeleton predicts power-law distribution of the required energy for practically controllable networks, which agrees qualitatively with numerics. However, interaction energy E (2) among the coexisting chains is ignored. In a physical system, interactions among the basic components can play an important role in determining the system's properties. To obtain a more accurate estimate of the behaviours of the control energy, we need to include the interactions among the chains. The necessity is further justified as there are discrepancies between the actual control energy and that from the LCC-skeleton, as exemplified in figure 2b. Thus, in order to reproduce the numerically obtained energy distributions, we must incorporate the interactions among the LCCs into the model. However, including the interactions makes analysis difficult, as there are typically a large number of interacting pairs of chains. To gain insight into the role played by the interactions, it is useful to focus on the relatively simple case of two interacting chains.
Our double-chain interaction model is constructed, as follows. Consider two identical unidirectional chains, denoted by C1 and C2, each of length D C . Every node in C1 connects with every node in C2 with probability p, all links between the two chains are unidirectional. A link points to C2 from C1 with probability p 1→2 and the probability for a link in the opposite direction is p 2→1 = 1 − p 1→2 . By changing the connection rate p and the directional bias p 1→2 , we can simulate and characterize various interaction patterns between the two chains. To be concrete, we generate an ensemble of 10 000 interacting doublechain networks, each with 2 · D C nodes and multiple randomized inter-chain links as determined by the parameters p and p 1→2 . As shown in figure 5a, the distribution of the control energy displays a remarkable similarity to that for random networks, in that a power-law scaling behaviour emerges with the exponent about 1.5. The power-law distribution holds robustly with respect to variations in the parameters p and p 1→2 , and the change in the magnitude of the energy due to small variations in the parameter values is insignificant when compared with that caused by a change in the length of the LCC. In addition, to reveal the role of the interaction between an LCC and a non-LCC chain in the control energy, we randomly pick their lengths from [3,6] with equal probability, where the longer chain acts as an LCC. Again, we observe a strong similarity between the energy distributions from random networks and from this model, as shown in figure 5b, suggesting a universal pattern followed by pair interactions, regardless of the length of the chains. In particular, interactions between two chains, LCC or not, have a similar effect on the control-energy distribution. These results indicate that the double-chain interaction model captures the essential physical ingredients of the energy distribution in controlling complex networks.
Authors' contributions. Y.C.L. devised the research project. Y.Z.C. and L.Z.W. performed numerical simulations. Y.Z.C., L.Z.W., W.X.W. and Y.C.L. analysed the results. Y.Z.C. and Y.C.L. wrote the paper. All authors gave final approval for publication.