The diminishing role of hubs in dynamical processes on complex networks

It is notoriously difficult to predict the behaviour of a complex self-organizing system, where the interactions among dynamical units form a heterogeneous topology. Even if the dynamics of each microscopic unit is known, a real understanding of their contributions to the macroscopic system behaviour is still lacking. Here, we develop information-theoretical methods to distinguish the contribution of each individual unit to the collective out-of-equilibrium dynamics. We show that for a system of units connected by a network of interaction potentials with an arbitrary degree distribution, highly connected units have less impact on the system dynamics when compared with intermediately connected units. In an equilibrium setting, the hubs are often found to dictate the long-term behaviour. However, we find both analytically and experimentally that the instantaneous states of these units have a short-lasting effect on the state trajectory of the entire system. We present qualitative evidence of this phenomenon from empirical findings about a social network of product recommendations, a protein–protein interaction network and a neural network, suggesting that it might indeed be a widespread property in nature.


Introduction
Many non-equilibrium systems consist of dynamical units that interact through a network to produce complex behaviour as a whole. In a wide variety of such systems, each unit has a state that quasi-equilibrates to the distribution of states of the units it interacts with, or 'interaction potential', which results in the new state of the unit. This assumption is also known as the local thermodynamic equilibrium (LTE), originally formulated to describe radiative transfer inside stars [1,2]. Examples of systems of coupled units that have been described in this manner include brain networks [3][4][5][6], cellular regulatory networks [7][8][9][10][11], immune networks [12,13], social interaction networks [14][15][16][17][18][19][20] and financial trading markets [15,21,22]. A state change of one unit may subsequently cause a neighbour unit to change its state, which may, in turn, cause other units to change, and so on. The core problem of understanding the system's behaviour is that the topology of interactions mixes cause and effect of units in a complex manner, making it hard to tell which units drive the system dynamics.
The main goal of complex systems research is to understand how the dynamics of individual units combine to produce the behaviour of the system as a whole. A common method to dissect the collective behaviour into its individual components is to remove a unit and observe the effect [23 -32]. In this manner, it has been shown, for instance, that highly connected units or hubs are crucial for the structural integrity of many real-world systems [28], i.e. removing only a few hubs disconnects the system into subnetworks which can no longer interact. On the other hand, Tanaka et al. [32] find that sparsely connected units are crucial for the dynamical integrity of systems where the remaining (active) units must compensate for the removed (failed) units. Less attention has been paid to study the interplay of the unit dynamics and network topology, from which the system's behaviour emerges, in a non-perturbative and unified manner.
We introduce an information-theoretical approach to quantify to what extent the system's state is actually a representation of an instantaneous state of an individual unit. The minimum number of yes/no questions that is required to determine a unique instance of a system's state is called its entropy, measured in the unit bits [33]. If a system state S t can be in state i with probability p i , then its Shannon entropy is HðS t Þ ¼ À X i p i log 2 p i : ð1:1Þ For example, to determine a unique outcome of N fair coin flips requires N bits of information, that is, a reduction of entropy by N bits. The more bits of a system's state S t are determined by a prior state s t0 i of a unit s i at time t 0 , the more the system state depends on that unit's state. This quantity can be measured using the mutual information between s t0 i and S t , defined as IðS t ; s t0 i Þ ¼ HðS t Þ À HðS t js t0 i Þ; ð1:2Þ where H(XjY) is the conditional variant of H(X ). As time passes (t ! 1), S t becomes more and more independent of s t0 i until eventually the unit's state provides zero information about S t . This mutual information integrated over time t is a generic measure of the extent that the system state trajectory is dictated by a unit. We consider large static networks of identical units whose dynamics can be described by the Gibbs measure. The Gibbs measure describes how a unit changes its state subject to the combined potential of its interacting neighbours, in case the LTE is appropriate and using the maximum-entropy principle [34,35] to avoid assuming any additional structure. In fact, in our LTE description, each unit may even be a subsystem in its own right in a multi-scale setting, such as a cell in a tissue or a person in a social network. In this viewpoint, each unit can actually be in a large number of (unobservable) microstates which translate many-to-one to the (observable) macrostates of the unit. We consider that at a small timescale, each unit probabilistically chooses its next state depending on the current states of its neighbours, termed discrete-time Markov networks [36]. Furthermore, we consider random interaction networks with a given degree distribution p(k), which denotes the probability that a randomly selected unit has k interactions with other units, and which have a maximum degree k max that grows less than linear in the network size N. Self-loops are not allowed. No additional topological features are imposed, such as degree-degree correlations or community structures. An important consequence of these assumptions for our purpose is that the network is 'locally tree-like' [37,38], i.e. link cycles are exceedingly long.
We show analytically that for this class of systems, the impact of a unit's state on the short-term behaviour of the whole system is a decreasing function of the degree k of the unit for sufficiently high k. That is, it takes a relatively short time-period for the information about the instantaneous state of such a high-degree unit to be no longer present in the information stored by the system. A corollary of this finding is that if one would observe the system's state trajectory for a short amount of time, then the (out-of-equilibrium) behaviour of the system cannot be explained by the behaviour of the hubs. In other words, if the task is to optimally predict the short-term system behaviour after observing a subset of the units' states, then high-degree units should not be chosen.
We validate our analytical predictions using numerical experiments of random networks of 6000 ferromagnetic Ising spins where the number of interactions k of a spin is distributed as a power-law p(k) / k 2g . Ising-spin dynamics are extensively studied and are often used as a first approximation of the dynamics of a wide variety of complex physical phenomena [37]. We find further qualitative evidence in the empirical data of the dynamical importance of units as function of their degree in three different domains, namely viral marketing in social networks [39], evolutionary conservation of human proteins [40] and the transmission of a neuron's activity in neural networks [41].

Information dissipation time of a unit
As a measure of the dynamical importance of a unit s, we calculate its information dissipation time (IDT), denoted D(s). In words, it is the time it takes for the information about the state of the unit s to disappear from the network's state. As another way of describing it, it is the time it takes for the network as a whole to forget a particular state of a single unit. Here, we derive analytically a relation between the number of interactions of a unit and the IDT of its state. Our method to calculate the IDT is a measure of cause and effect and not merely of correlation; see appendix for details.

Terminology
A system S consists of units s 1 , s 2 , . . . among which some pairs of units, called edges, E ¼ (s i , s j ), (s k , s l ), . . . interact with each other. Each interaction is undirected, and the number of interactions that involve unit s i is denoted by k i , called its degree, which equals k with probability p(k), called the degree distribution. The set of k i units that s i interacts with directly is denoted by h i ¼ {x : ðs i ; xÞ [ E}. The state of unit s i at time t is denoted by s t i , and the collection S t ¼ s t 1 ; s t 2 ; . . . ; s t N forms the state of the system. Each unit probabilistically chooses its next state based on the current state of each of its nearest-neighbours in the interaction network. Unit s i chooses the next state x with the conditional probability distribution pðs tþ1 i ¼ xjh t i Þ. This is also known as a Markov network.

Unit dynamics in the local thermodynamic equilibrium
Before we can proceed to show that D(s) is a decreasing function of the degree k of the unit s, we must first define the class of unit dynamics in more detail. That is, we first specify an expression for the conditional probabilities pðs tþ1 ¼ rjh t Þ.
We focus on discrete-time Markov networks, so the dynamics of each unit is governed by the same set of conditional probabilities pðs tþ1 ¼ rjh t Þ with the Markov property. In our LTE description, a unit chooses its next state depending on the energy of that state, where the energy landscape induced by the states of its nearest-neighbours through its interactions. That is, each unit can quasi-equilibrate its state to the states of its neighbours. The higher the energy of a state at a given time, the less probable the unit chooses the state. Stochasticity can arise if multiple states have an equal energy, and additional stochasticity is introduced by rsif.royalsocietypublishing.org J R Soc Interface 10: 20130568 means of the temperature of the heat bath that surrounds the network.
The consequence of this LTE description that is relevant to our study is that the state transition probability of a unit is an exponential function with respect to the energy. That is, in a discrete-time description, s t chooses s tþ1 ¼ r as the next state with a probability where T is the temperature of the network's heat bath and P j eðrjs t j Þ is the energy of state r given the states of its interacting neighbours s t j [ h t . As a result, the energy landscape of r does not depend on individual states of specific neighbour units; it depends on the distribution of neighbour states.

Information as a measure of dynamical impact
The instantaneous state of a system S t consists of H(S t ) bits of Shannon information. In other words, H(S t ) answers to unique yes/no questions (bits) must be specified in order to determine a unique state S t . As a consequence, the more bits about S t are determined by the instantaneous state s t0 i of a unit s i at time t 0 t, the more the system state S t depends on the unit's state s t0 i . The impact of a unit's state s t0 i on the system state S t at a particular time t can be measured by their mutual information IðS t ; s t0 i Þ. In the extreme case that s t0 i fully determines the state S t , the entropy of the system state coincides with the entropy of the unit state, and the dynamical impact is maximum at In the other extreme case, the unit state s t0 i is completely irrelevant to the system state S t , the information is minimum at IðS t ; s t0 i Þ ¼ 0. The decay of this mutual information over time (as t ! 1) is then a measure of the extent that the system's state trajectory is affected by an instantaneous state of the unit. In other words, it measures the 'dynamical importance' of the unit. If the mutual information reaches zero quickly, then the state of the unit has a short-lasting effect on the collective behaviour of the system. The longer it takes for the mutual information to reach zero, the more influential is the unit to the system's behaviour. We call the time it takes for the mutual information to reach zero the IDT of a unit.

Defining the information dissipation time of a unit
At each time step, the information stored in a unit's state s t i is partially transmitted to the next states of its nearestneighbours [42,43], which, in turn, transmit it to their nearest-neighbours, and so on. The state of unit s at time t dictates the system state at the same time t to the amount of I k 0 ; IðS t ; s t Þ ¼ Iðs t ; s t Þ ¼ Hðs t Þ; ð2:2Þ with the understanding that unit s has k interactions. We use the notation I k 0 instead of I s 0 ; because all units that have k interactions are indistinguishable in our model. At time t þ 1, the system state is still influenced by the unit's state s t , the amount of which is given by ð2:3Þ As a result, a unit with k connections locally dissipates its information at a ratio I k 1 =I k 0 per time step. Here, we use the observation that the information about a unit's state s t , which is at first present at the unit itself at the maximum amount H(s t ), can be only transferred at time t þ 1 to the direct neighbours h of s, through nearest-neighbour interactions.
At subsequent time steps (t þ 2 and onward), the information about the unit with an amount of I k 1 will dissipate further into the network at a constant average ratiô from its neighbours, neighbours-of-neighbours, etc. This is due to the absence of degree -degree correlations or other structural bias in the network. That is, the distribution q(m) of the degrees of a unit's neighbours (and neighbours-ofneighbours) does not depend on its own degree k. Here, qðmÞ ¼ ðm þ 1Þpðm þ 1Þkml À1 is the probability distribution of the number of additional interactions that a nearestneighbour unit contains besides the interaction with unit s, or the interaction with a neighbour of unit s, etc., called the excess degree distribution [44]. As a consequence, the dissemination of information of all nodes occurs at an equal ratio per time step except for the initial amount of information I k 1 , which the k neighbour states contain at time t þ 1, which depends on the degree k of the unit. Note that this definition ofÎ ignores the knowledge that the source node has exactly k interactions, which at first glance may impact the ability of the neighbours to dissipate information. However, this simplification is self-consistent, namely we will show that I k 1 diminishes for increasing k: this reduces the dissipation of information of its direct neighbours, which, in turn, reduces I k 1 for increasing k, so that our conclusion that I k 1 diminishes for increasing k remains valid. See also appendix A for a second line of reasoning, about information flowing back to the unit s.
In general, the ratio per time step at which the information about s t i dissipates from t þ 2 and onward equalsÎ up to an 'efficiency factor' that depends on the state-state correlations implied by the conditional transition probabilities pðs tþ1 k js t j Þ. For example, if s t A dictates 20% of the information stored in its neighbour state s tþ1 B , and s tþ1 B , in turn, dictates 10% of the information in s tþ2 C , then Iðs t A ; s tþ2 C Þ may not necessarily equal 20% Â 10% ¼ 2% of the information Hðs tþ2 C Þ stored in s tþ2 C . That is, in one extreme, s tþ1 B may use different state variables to influence s tþ2 C than the variables that were influenced by s t A , in which case Iðs t A ; s tþ2 C Þ is zero, and the information transmission is inefficient. In the other extreme, if s tþ1 B uses only state variables that were set by s t A to influence s tþ2 C , then passing on A's information is optimally efficient and Iðs t A ; s tþ2 C Þ ¼ 10%. Therefore, we assume that at every time step from time t þ 2 onward, the ratio of information about a unit that is passed on is c eff ÁÎ; i.e. corrected by a constant factor 0 c eff 1=Î that depends on the similarity of dynamics of the units. It is non-trivial to calculate c eff but its bounds are sufficient for our proceeding.
Next, we can define the IDT of a unit. The number of time steps it takes for the information in the network about unit s with degree k to reach an arbitrarily small constant 1 is Note that D(s) is not equivalent to the classical correlation length. The correlation length is a measure of the time it takes for a unit to lose a certain fraction of its original rsif.royalsocietypublishing.org J R Soc Interface 10: 20130568 correlation with the system state, instead of the time it takes for the unit to reach a certain absolute value of correlation. For our purpose of comparing the dynamical impact of units, the correlation length would not be a suitable measure. For example, if unit A has a large initial correlation with the system state and another unit B has a small initial correlation, but the halftime of their correlation is equal, then, in total, we consider A to have more impact on the system's state because it dictates more bits of information of the system state.

Diminishing information dissipation time of hubs
As a function of the degree k of unit s, the unit's IDT satisfies becauseÎ, c and 1 are independent of the unit's degree. Here, the proportionality factor equals Àðlog c eff þ logÎÞ À1 , which is non-negative, because the dissipation ratio c eff ÁÎ is at most 1, and the additive constant equals 2log1, which is positive as long as 1 , 1. Because the logarithm preserves order, to show that the IDT diminishes for high-degree units, it is sufficient to show that I k 1 decreases to a constant, as k ! 1, which we do next.
The range of the quantity I k due to the conditional independence among the neighbour states s tþ1 j given the node state s t i . In the average case, the upper bound can be written as k Á kIðs tþ1 j ; s t i Þl k j ; and we can write I k 1 as where T(k) is the information in a neighbour unit's next state averaged over its degree, and U(k) is the degree of 'uniqueness' of the next states of the neighbours. The operator k Á l kj denotes an average over the degree k j of a neighbour unit s j , i.e. weighted by the excess degree distribution q(k j 2 1). In one extreme, the uniqueness function U(k) equals unity in case the information of a neighbour does not overlap with that of any other neighbour unit of s t i , i.e. the neighbour states do not correlate. It is less than unity to the extent that information does overlap between neighbour units, but is never negative. See §S3 in the electronic supplementary material for a detailed derivation of an exact expression and bounds of the uniqueness function U(k).
Because the factor U(k) . k is at most a linear growing function of k, a sufficient condition for D(s i ) to diminish as k ! 1 is for T(k) to decrease to zero more strongly than linear in k. After a few steps of algebra (see appendix), we find that Tðk þ 1Þ ¼ a Á TðkÞ, where a 1: ð2:9Þ Here, equality for a only holds in the degenerate case where only a single state is accessible to the units. In words, we find that the expected value of T(k) converges downward to a constant at an exponential rate as k ! 1.
Because each term is multiplied by a factor a 1, this convergence is downward for most systems but never upward even for degenerate system dynamics.

Numerical experiments with networks of Ising spins
For our experimental validation, we calculate the IDT D(s) of 6000 ferromagnetic spins with nearest-neighbour interactions in a heavy-tailed network in numerical experiments and find that it, indeed, diminishes for highly connected spins. In figure 1, we show the numerical results and compare them with the analytical results, i.e. evaluating equation (2.5).
The analytical calculations use the single-site Glauber dynamics [45] to describe how each spin updates its state depending on the states of its neighbours. In this dynamics, at each time step, a single spin chooses its next state according to its stationary distribution of state, which would be induced if its nearest-neighbour spin states would be fixed to their instantaneous value (LTE). We calculate the upper bound of D(s) by setting U(k) ¼ 1, that is, all information about a unit's state is assumed to be unique that optimizes its IDT. A different constant value for U(k) would merely scale the vertical axis.  rsif.royalsocietypublishing.org J R Soc Interface 10: 20130568 We perform computer simulations to produce time series of the states of 6000 ferromagnetic Ising spins and measure the dynamical importance of each unit by regression. For each temperature value, we generate six random networks with p(k) / k 2g for g ¼ 1.6 and record the state of each spin at 90 000 time steps. The state of each unit is updated using the Metropolis-Hastings algorithm instead of the Glauber update rule to show generality. In the Metropolis-Hastings algorithm, a spin will always flip its state if it lowers the interaction energy; higher energy states are chosen with a probability that decreases exponentially as function of the energy increase. Of the resulting time series of the unit states, we computed the time d i where Iðs tþd i 1 ; :::; s tþd i N ; s t i Þ ¼ 1 of each unit s i by regression. This is semantically equivalent to D(s i ) but does not assume a locally tree-like structure or a uniform information dissipation rateÎ. In addition, it ignores the problem of correlation (see appendix A). See section S1 in the electronic supplementary material for methodological details; see section S2 in the electronic supplementary material for results using higher values of the exponent g. The results are presented in figure 1.

Empirical evidence
We present empirical measurements from the literature of the impact of units on the behaviour of three different systems, namely networks of neurons, social networks and protein dynamics. These systems are commonly modelled using a Gibbs measure to describe the unit dynamics. In each case, the highly connected units turn out to have a saturating or decreasing impact on the behaviour of the system. This provides qualitative evidence that our IDT, indeed, characterizes the dynamical importance of a unit, and, consequently, that highly connected units have a diminishing dynamical importance in a wide variety of complex systems. In each study, it remains an open question which mechanism is responsible for the observed phenomenon. Our work proposes a new candidate explanation for the underlying cause for each case, namely that it is an inherent property of the type of dynamics that govern the units.
The first evidence is found in the signal processing of in vitro networks of neurons [41]. The denser neurons are placed in a specially prepared Petri dish, the more connections (synapses) each neuron creates with other neurons. In their experiments, Ivenshitz and Segal found that sparsely connected neurons are capable of transmitting their electrical potential to neighbouring neurons, whereas densely connected neurons are unable to trigger network activity even if they are depolarized in order to discharge several action potentials. Their results are summarized in figure 2. In search for the underlying cause, the authors exclude some obvious candidates, such as the ratio of excitatory versus inhibitory connections, the presence of compounds that stimulate neuronal excitability and the size of individual postsynaptic responses. Although the authors do find telltale correlations, for example, between the network density and the structure of the dendritic trees, they conclude that the phenomenon is not yet understood. Note that in this experiment, the sparsely connected neuron is embedded in a sparsely connected neural network, whereas the densely connected neuron is in a dense network. A further validation would come from a densely connected neuron embedded in a sparse network in order to disentangle the network's contribution from the individual effect.
Second, in a person-to-person recommendation network consisting of four million persons, Leskovec et al. [39] found that the most active recommenders are not necessarily the most successful. In the setting of word-of-mouth marketing among friends in the social networks, the adoption rate of recommendations saturates or even diminishes for the highly active recommenders, which is shown in figure 3 for four product categories. This observation is remarkable, because in the dataset, the receiver of a recommendation does not know how many other persons receive it as well. As a possible explanation, the authors hypothesize that widely recommended products may not be suitable for viral marketing. Nevertheless, the underlying cause remains an open question. We propose an additional hypothesis, namely that highly active recommenders have a diminishing impact on the opinion forming of others in the social network. In fact, the model of Ising spins in our numerical experiments is a widely used model for opinion forming in social networks [14 -16,18,20]. As a consequence, the results in figure 1 may be interpreted as estimating the dynamical impact of a person's opinion as function of the number of friends that he debates his opinion with.
The third empirical evidence is found in the evolutionary conservation of human proteins [40]. According to the neutral model of molecular evolution, most successful mutations in proteins are irrelevant to the functioning of the system of  Figure 2. The level of activity of a set of neurons under a microscope as function of time, after seeding one neuron with an electrical potential (black line). The activity was measured by changes in calcium ion concentrations. These concentrations were detected by imaging fluorescence levels relative to the average fluorescence of the neurons (activity 0) measured prior to activation. In the sparse cultures with few synapses per neuron, the stimulated neuron evokes a network burst of activity in all other neurons in the field after a short delay. By contrast, in the dense cultures with many synapses per neuron, only the stimulated neuron has an increased potential. The data for these plots were kindly provided by Ivenshitz & Segal [41]. (a) Low connectivity and (b) high connectivity. rsif.royalsocietypublishing.org J R Soc Interface 10: 20130568 5 protein-protein interactions [46]. This means that the evolutionary conservation of a protein is a measure of the intolerance of the organism to a mutation to that protein, i.e. it is a measure of the dynamical importance of the protein to the reproducibility of the organism [47]. Brown & Jurisica [40] measured the conservation of human proteins by mapping the human protein-protein interaction network to that of mice and rats using 'orthologues', which is shown in figure 4. Two proteins in different species are orthologous if they descend from a single protein of the last common ancestor. Their analysis reveals that the conservation of highly connected proteins is inversely related with their connectivity. Again, this is consistent with our analytical prediction. The authors conjecture that this effect may be due to the overall high conservation rate, approaching the maximum of 1 and therefore affecting the statistics. We suggest that it may indeed be an inherent property of protein interaction dynamics.

Discussion
We find that various research areas encounter a diminishing dynamical impact of hubs that is unexplained. Our analysis demonstrates that this phenomenon could be caused by the combination of unit dynamics and the topology of their interactions. We show that in large Markov networks, the dynamical behaviour of highly connected units have a low impact on the dynamical behaviour of the system as a whole, in the case where units choose their next state depending on the interaction potential induced by their nearest-neighbours.
For highly connected units, this type of dynamics enables the LTE assumption, originally used for describing radiative transport in a gas or plasma. To illustrate LTE, there is no  Figure 4. The fraction of evolutionary conservation of human proteins as a function of their connectivity k. The fraction of conservation is measured as the fraction of proteins that have an orthologous protein in the mouse (circles) and the rat (crosses). The dashed and dot-dashed curves show the trend of the conservation rates compared with mice and rates, respectively. They are calculated using a Gaussian smoothing kernel with a standard deviation of 10 data points. To evaluate the significance of the downward trend of both conservation rates, we performed a least-squares linear regression of the original data points starting from the peaks in the trend lines up to k ¼ 70. For the fraction of orthologues with mice, the slope of the regression line is 20.00347 + 0.00111 (mean and standard error); with rats, the slope is 20.00937 + 0.00594. The vertical bars denote the number of proteins with k interactions in the human protein-protein interaction network (logarithmic scale). The data for these plots were kindly provided by Brown & Jurisica [40].
single temperature value that characterizes an entire star: the outer shell is cooler than the core. Nonetheless, the mean free path of a moving photon inside a star is much smaller than the temperature gradient, so on a small timescale, the photon's movement can be approximated using a local temperature value. A similar effect is found in various systems of coupled units, such as social networks, gene regulatory networks and brain networks. In such systems, the internal dynamics of a unit is often faster than a change of the local interaction potential, leading to a multi-scale description. Intuitive examples are the social interactions in blog websites, discussion groups or product recommendation services. Here, changes that affect a person are relatively slow so that he can assimilate his internal state-of-mind (the unit's microstate) to his new local network of friendships and the set of personal messages he received, before he makes the decision to add a new friend or send a reply (the unit's macrostate). Indeed, this intuition combined with our analysis is consistent with multiple observations in social networks. Watts & Doods [48] numerically explored the importance of 'influentials', a minority of individuals who influence an exceptional number of their peers. They find counter to intuition that large cascades of influence are usually not driven by influentials, but rather by a critical mass of easily influenced individuals. Granovetter [49] found that even though hubs gather information from different parts of the social network and transmit it, the clustering and centrality of a node provide better characteristics for diffusing innovation [50]. Rogers [51] found experimentally that the innovator is usually an individual in the periphery of the network, with few contacts with other individuals. Our approach can be interpreted in the context of how dynamical systems intrinsically process information [42,43,[52][53][54][55][56]. That is, the state of each unit can be viewed as a (hidden) storage of information. As one unit interacts with another unit, part of its information is transferred to the state of the other unit (and vice versa). Over time, the information that was stored in the instantaneous state of one unit percolates through the interactions in the system, and at the same time it decays owing to thermal noise or randomness. The longer this information is retained in the system state, the more the unit's state determines the state trajectory of the system. This is a measure of the dynamical importance of the unit, which we quantify by D(s).
Our work contributes to the understanding of the behaviour of complex systems at a conceptual level. Our results suggest that the concept of information processing can be used, as a general framework, to infer how dynamical units work together to produce the system's behaviour. The inputs to this inference are both the rules of unit dynamics as well as the topology of interactions, which contrasts with most complex systems research. A popular approach to infer the importance of units in general are topology-only measures such as connectedness and betweenness-centrality [28,30,[57][58][59][60][61][62], following the intuition that well-connected or centrally located units must be important to the behaviour of the system. We demonstrate that this intuition is not necessarily true. A more realistic approach is to consider to simulate a simple process on the topology, such as the percolation of particles [63], magnetic spin interactions [3,6,14,20,37,[64][65][66][67][68][69][70][71][72] or the synchronization of oscillators [37,60,[73][74][75][76][77][78][79][80]. The dynamical importance of a unit in a such model is then translated to that of the complex system under investigation. Among the 'totalistic' approaches that consider the dynamics and interaction topology simultaneously, a common method to infer a unit's dynamical importance is to perform 'knock-out' experiments [29][30][31]. That is, experimentally removing or altering a unit and observing the difference in the system's behaviour. This is a measure of how robust the system is to a perturbation, however, and care must be taken to translate robustness into dynamical importance. In case the perturbation is not part of the natural behaviour of the system, then the perturbed system is not a representative model of the original system. To illustrate, we find that highly connected ferromagnetic spins hardly explain the observed dynamical behaviour of a system, even though removing such a spin would have a large impact on the average magnetization, stability and critical temperature [81,82]. In summary, our work is an important step towards a unified framework for understanding the interplay of the unit dynamics and network topology from which the system's behaviour emerges.
where Z k is the partition function for a unit with k edges. As k ) jSj; the set of interaction energies starts to follow a stationary distribution of nearest-neighbour states, and the expression can be approximated as Here, ke q l is the expected interaction energy of the state q with one neighbour, averaged over the neighbours' state distribution. If an edge is added to such a unit, the expression becomes (the subscript k þ 1 denotes the degree of the node as a reminder) In words, the energy term for each state q is multiplied by a factor e Àkeql=T that depends on the state but is constant with respect to k. (The partition function changes with k to suitably normalize the new terms, but it does not depend on q and so rsif.royalsocietypublishing.org J R Soc Interface 10: 20130568 is not responsible for moving probability mass.) That is, as k grows, the probability of the state q with the lowest expected interaction energy approaches unity; the probabilities of all other states will approach zero. The approaches are exponential, because the multiplying factors do not depend on k.
If there are m states with the lowest interaction energies (multiplicity m), then each probability of these states will approach 1/m.

A.2. Deriving an upper bound on a in
First, we write T(k) as an expected mutual information between the state of a unit and the next state of its neighbour, where the average is taken over the degree of the neighbour unit: TðkÞ ¼ kHðs t i Þ À Hðs t i js tþ1 j Þl k j : ðA 4Þ We will now study how T(k) behaves as k grows for large k. By definition, both entropy terms are non-negative, and Hðs t i js tþ1 j Þ Hðs t i Þ. In §A.1 of this appendix, we find that the prior probabilities of the state of a high-degree unit exponentially approach either zero from above or a constant from below. In the following, we assume that this constant is unity for the sake of simplicity, i.e. that there is only one state with the lowest possible interaction energy.
ðA 5Þ In words, the first entropy term eventually goes to zero exponentially as function of the degree of a unit. Because this entropy term is the upper bound on the function T(k), there are three possibilities for the behaviour of T(k). The first option is that T(k) is zero for all k, which is a degenerate system without dynamical behaviour. The second option is that T(k) is a monotonically decreasing function of k, and the third option is that T(k) first increases and then decreases as function of k. In both cases, for large k the function, T(k) must approach zero exponentially.
In summary, we find that for large k Tðk þ 1Þ ¼ a Á TðkÞ; where a , 1: ðA 6Þ The assumption of multiplicity unity of the lowest interaction energy is not essential. If this assumption is relieved, then in step 3 of equation (A 5), then the first term does not become zero but a positive constant. It may be possible that a system where T(k) equals this constant across k is not degenerate, in contrast to the case of multiplicity unity, so in this case, we must relax the condition in equation (A 6) to include the possibility that all units are equally important, i.e. a 1. This still makes it impossible for the impact of a unit to keep increasing as its degree grows.
A.3. Information flowing back to a high-degree unit In the main text, we simplify the information flow through the network by assuming that the information at the amount I k 1 stored in the neighbours of a unit flows onward into the network, and does not flow back to the unit. Here, we rationalize that this assumption is appropriate for high-degree units. Suppose that at time t þ 1, the neighbour unit s j stores Iðs t i ; s tþ1 j Þ bits of information about the state s t i . At time t þ 2, part of this information will be stored by two variables: the unit's own state s tþ2 i and the combined variable of neighbour-of-neighbour states fs j1 ; :::; s jk j g. In order for the IDT D(s i ) of unit s i to be affected by the information that flows back, this information must add a (significant) amount to the total information at time t þ 2. We argue however that this amount is insignificant, i.e.

Iðs t
i ; S tþ2 Þ À Iðs t i ; fs tþ2 j1 ;:::; s tþ2 jkj gÞ ¼ Iðs t i ; s tþ2 i jfs tþ2 j1 ;:::; s tþ2 jk j gÞ ! k i !1 0: ðA 7Þ The term Iðs t i ; s tþ2 i jfs tþ2 j1 ;:::; s tþ2 jkj gÞ is the conditional mutual information. Intuitively, it is the information that s tþ2 i stores about s t i which is not already present in the states fs tþ2 j1 ;:::; s tþ2 jkj g. The maximum amount of information that a variable can store about other variables is its entropy, by definition. It follows from sections A.1 and A.2 of appendix that the entropy of a high-degree unit is lower than the average entropy of a unit. In fact, in the case of multiplicity unity of the lowest interaction energy the capacity of a unit goes to zero as k ! 1. For this case, this proves that Iðs t i ; s tþ2 i jfs tþ2 j1 ;:::; s tþ2 jk j gÞ, indeed, goes to zero. For higher multiplicities, we observe that the entropy Hðs tþ2 i Þ is still (much) smaller than the total entropy of the neighbours of a neighbour Hðs tþ2 j1 Þ þ Hðs tþ2 j2 js tþ2 j1 Þ þ Á Á Á Therefore, the information Iðs t i ; s tþ2 i Þ that flows back is (much) smaller than Iðs t i ; fs tþ2 j1 ;:::; s tþ2 jkj gÞ, and the conditional variant is presumably smaller still. Therefore, we assume that also in this case, the information that flows back has an insignificant effect on D(s i ).

A.4. A note on causation versus correlation
In the general case, the mutual information Iðs t x ; s t0 y Þ between the state of unit s x at time t 0 and another unit's state s y at time t is the sum of two parts: I causal , which is information that is due to a causal relation between the state variables, and I corr , which is information due to 'correlation' that does not overlap with the causal information. Correlation occurs if the units s x and s y both causally depend on a third 'external' variable e in a similar manner, i.e. such that I(e; ðs t x ; s t0 y Þ T ) , Iðe; s t x Þ þ Iðe; s t0 y Þ. This can lead to a non-zero mutual information Iðs t x ; s t0 y Þ among the two units, even if the two units would not directly depend on each other in a causal manner [83,84].
For this reason, we do not directly calculate the dependence of IðS t ; s t0 Þ on the time variable t in order to calculate the IDT of a unit s. It would be difficult to tell how much of this information is non-causal at every time point. In order to find this out, we would have to understand exactly how each bit of information is passed onward through the system, from one state variable to the next, which we do not yet understand at this time.
rsif.royalsocietypublishing.org J R Soc Interface 10: 20130568 To prevent measuring the non-causal information present in the network, we use local single-step 'kernels' of information diffusion, namely the I k 1 =I k 0 as discussed previously. The information I k 0 is trivially of causal nature (i.e. noncausal information is zero), because it is fully stored in the state of the unit itself. Although, in the general case, I k 1 may consist of a significant non-causal part, in our model, we assume this to be zero or at most an insignificant amount. The rationale is that units do not self-interact (no selfloops), and the network is locally tree-like: if s x and s y are direct neighbours, then there is no third s z with 'short' interaction pathways to both s x and s y . The only way that noncausal (i.e. not due to s x t influencing s tþ1 y ) information can be created between s t x and s tþ1 y is through the pair of interaction paths s t 0 z ! Á Á Á ! s tÀ1 y ! s t x and s t 0 z ! Á Á Á ! s tþ1 y , where t' , t 2 1. That is, one and the same state variable s t 0 z must causally influence both s t x and s tþ1 y , where it can reach s x only through s y . We expect any thusly induced non-causal information in Iðs tþ1 y ; s t x Þ is insignificant compared with the causal information through s t x ! s tþ1 y , and the reason is threefold. First, the minimum lengths of the two interaction paths from s z are two and three interactions, respectively, where information is lost through each interaction due to its stochastic nature. Second, of the information that remains, not all information Iðs t 0 z ; s t x Þ may overlap with Iðs t 0 z ; s tþ1 y Þ, but even if it does, then the 'correlation part' of the mutual information Iðs tþ1 y ; s t x Þ due to this overlap is upper bounded by their minimum: min {Iðs t 0 z ; s t x Þ; Iðs t 0 z ; s tþ1 y Þ}. Third, the mutual information due to correlation may, in general, overlap with the causal information, i.e. both pieces of information may be partly about the same state variables. That is, the I corr part of Iðs tþ1 y ; s t x Þ, which is the error of our assumption, is only that part of the information-due-to-correlation that is not explained by (contained in) I causal . The final step is the observation that I k 1 is the combination of all Iðs tþ1 y ; s t x Þ for all neighbour units s y [ h x .