Abstract
A central question for causal inference is to decide whether a set of correlations fits a given causal structure. In general, this decision problem is computationally infeasible and hence several approaches have emerged that look for certificates of compatibility. Here, we review several such approaches based on entropy. We bring together the key aspects of these entropic techniques with unified terminology, filling several gaps and establishing new connections, all illustrated with examples. We consider cases where unobserved causes are classical, quantum and post-quantum, and discuss what entropic analyses tell us about the difference. This difference has applications to quantum cryptography, where it can be crucial to eliminate the possibility of classical causes. We discuss the achievements and limitations of the entropic approach in comparison to other techniques and point out the main open problems.
1. Introduction
Deciding whether a causal explanation is compatible with given statistical data and exploring whether it is the most suitable explanation for the data at hand are central scientific tasks. Sometimes the most reasonable explanation of a set of observations involves unobserved common causes. In the case where the common causes are classical, the well-developed machinery of Bayesian networks can be used [1,2]. In principle, such networks are well understood and it is known how to check whether observed correlations are compatible with a given network [3]. In practice, however, testing compatibility for networks that involve unobserved systems is only computationally tractable for small cases [4,5]. Furthermore, the methodology has to be adapted whenever non-classical common causes are permitted.
Finding good heuristics to help identify correlations that are (in)compatible with a causal structure is currently an active area of research [6–20] and the use of entropy measures is common to many of these [6–13,17,18,20]. Such methods are important in the quantum context, where recent cryptographic protocols rely on the lack of a classical causal explanation for certain quantum correlations in specified causal structures [21–29], an idea that lies behind Bell’s theorem [30] (see also [31]).
In §2 of this article, we review the entropic characterization of the correlations compatible with causal structures in classical, quantum and more general non-signalling theories. We detail refinements of the approach based on post-selection in §3. Together, these sections show the current capabilities of entropic techniques, also establishing and clarifying connections between different contributions. Our review is illustrated with several examples to assist its understanding and to make it easily accessible for applications. In §4 we outline and compare further approaches to the problem, before concluding in §5 with some open questions. Where lemmas or propositions appear without citations, we are not aware of a reference where they are stated explicitly (although several have been implicitly used in the existing literature).
2. Entropy vector approach
Characterizing the joint distributions of a set of random variables or alternatively considering a multiparty quantum state in terms of its entropy (and of those of its marginals) has a tradition in information theory, dating back to Shannon [32–34]. However, only recently has this approach been extended to account for causal structure [6,8]. In §2a and 2b, respectively, we review this approach with and without imposing causal constraints. All our considerations are concerned with discrete random variables; for extensions of the approach to continuous random variables (and its limitations) we refer to [8,35].
(a) Classical entropy cone
The entropy cone for a joint distribution of n random variables was introduced in [33]. It is defined in terms of the Shannon entropy [32], which, for a discrete random variable X taking values with probability distribution PX, is defined by
For a set of n≥2 jointly distributed random variables, Ω:={X1,X2,…,Xn}, we denote their probability distribution as , where is the set of all probability mass functions of n jointly distributed random variables. The Shannon entropy maps any subset of Ω to a non-negative real value: where denotes the power set of Ω, and H({})=0. The entropy of the joint distribution of the random variables Ω and of all its marginals can be expressed as components of a vector in , ordered in the following as
(i) Outer approximation: the Shannon cone
The standard outer approximation to is the polyhedral cone constrained by the Shannon inequalities listed in the following:
— Monotonicity. For all XT, XS⊆Ω, H(XS∖XT)≤H(XS).
— Submodularity. For all XS, XT⊆Ω, H(XS∩XT)+H(XS∪XT)≤H(XS)+H(XT).
It is a matter of convention, whether H({})=0 is included as a Shannon inequality; we keep this implicit.
These inequalities are always obeyed by the entropies of a set of jointly distributed random variables. They may be concisely rewritten in terms of the following information measures: the conditional entropy of two jointly distributed random variables X and Y , H(X | Y):=H(XY)−H(Y), their mutual information, I(X:Y):=H(X)+H(Y)−H(XY), and the conditional mutual information between two jointly distributed random variables X and Y given a third, Z, defined by I(X:Y | Z):=H(XZ)+H(Y Z)−H(Z)−H(XY Z). Hence, the monotonicity constraints correspond to positivity of conditional entropy, H(XS∩XT | XS∖XT)≥0, and submodularity is equivalent to positivity of the conditional mutual information, I(XS∖XT:XT∖XS | XS∩XT)≥0. The monotonicity and submodularity constraints can all be generated from a minimal set of n+n(n−1)2n−3 inequalities [33]: for the monotonicity constraints it is sufficient to consider the n constraints with XS=Ω and XT=Xi for some Xi∈Ω; for the submodularity constraints it is sufficient to consider those with XS∖XT=Xi and XT∖XS=Xj with i<j and where XU:=XS∩XT is any subset of Ω not containing Xi or Xj, i.e. submodularity constraints of the form I(Xi:Xj | XU)≥0.
These n+n(n−1)2n−3 independent Shannon inequalities can be expressed in terms of a (n+n(n−1)2n−3)×(2n−1)-dimensional matrix, which we call , such that, for any , the conditions hold (these are to be interpreted as the requirement that each component of is non-negative). More generally, for , a violation of certifies that there is no distribution such that v=H(P). It follows that the Shannon cone, , is an outer approximation of the set of achievable entropy vectors, [33].
The three-variable Shannon cone is , where
Example 2.1.
(ii) Beyond the Shannon cone
For two variables the Shannon cone coincides with the actual entropy cone, , while for three random variables this holds only for the closure of the entropic cone, i.e. but [36,37]. For n≥4, further independent constraints on the set of entropy vectors are needed to fully characterize , the first of which was discovered in [38].
For any four discrete random variables T, U, V and W the following inequality holds: .Proposition 2.2 (Zhang & Yeung).
For n≥4, the convex cone is not polyhedral, i.e. it cannot be characterized by finitely many linear inequalities [39]. Nonetheless, many linear entropic inequalities have been discovered [36,38,40,41]. Recently, systematic searches for new entropic inequalities for n=4 have been conducted [42,43], which recover most of the previously known inequalities; in particular, the inequality of proposition 2.2 is rederived and shown to be implied by tighter ones [43]. The systematic search in [43] is based on considering additional random variables that obey certain constraints and then deriving four-variable inequalities from the known identities for five or more random variables (see also [38,39]), an idea that is captured by a so-called copy lemma [38,43,44]. In the same article, several rules to generate families of inequalities have been suggested, in the style of techniques introduced by Matúš [39].
For more than four variables, a few additional inequalities are known [38,40]. Curiously, to our knowledge, all known relevant non-Shannon inequalities (i.e. the ones found in [38–43] that are not yet superseded by tighter ones) can be written as a positive linear combination of the Ingleton quantity, I(T:U | V)+I(T:U | W)+I(V :W)−I(T:U), and conditional mutual information terms (see also [43]).
(iii) Inner approximations
Inner approximations are constructed from a set of conditions that are sufficient for an entropy vector to be realized by a distribution. Such conditions can be stated in terms of so-called linear rank inequalities [45–48]. They can be useful for establishing tightness of outer approximations.
For the four-variable entropy cone, , an inner approximation is defined as the region constrained by the Shannon inequalities and the six permutations of the Ingleton inequality [45], I(T:U | V)+I(T:U | W)+I(V :W)−I(T:U)≥0, for random variables T, U, V and W. These inequalities can be concisely written as a matrix . The constrained region is called the Ingleton cone, and it has the property that v∈ΓI implies v∈Γ*n [46]. By contrast, there are entropy vectors that violate the Ingleton inequalities, as the following example shows.
Let T, U, V and W be four jointly distributed random variables. Let V and W be uniform random bits and let T=AND(¬V,¬W) and U=AND(V,W). This distribution [39] leads to the entropy vector v≈(0.81,0.81,1,1,1.50,1.50,1.50,1.50,1.50,2,2,2,2,2,2), for which I(T:U | V)+I(T:U | W)+I(V :W)−I(T:U)≈−0.12 in violation of the Ingleton inequality.Example 2.3.
For five random variables an inner approximation in terms of Shannon, Ingleton and 24 additional inequalities and their permutations is known (including partial extensions to more variables) [47,48].
(b) Entropy vectors for causal structures
Causal relations among a set of variables impose constraints on their possible joint distributions, which can be conveniently represented with a causal structure.
A causal structure is a set of variables arranged in a directed acyclic graph (DAG), in which a subset of the nodes is assigned as observed.Definition 2.4.
The directed edges of the graph are intended to represent causation, perhaps by propagation of some influence, and cycles are excluded to avoid the well-known paradoxes associated with causal loops. We will interpret causal structures in different ways, depending on the supposed physics of whatever is mediating the causal influence.
One of the simplest causal structures that leads to interesting insights and one of the most thoroughly analysed ones is Pearl’s instrumental causal structure, IC [49]. It is displayed in figure 1a and will be used as an example throughout this review.
Figure 1. (a) Pearl’s instrumental scenario. The nodes X, Y and Z are observed; A is unobserved. In the classical case, this can be understood in the following way: A random variable X and an unobserved A are used to generate another random variable Z. Then Y is generated from A and the observed output of node Z. In particular, note that no other information can be forwarded from X through the node Z to Y . In the quantum case, the source A shares a quantum system , where . The subsystem AZ is measured to produce Z and likewise for Y . The subsystems AZ and AY are both considered to be parents of Z (and Y). (b) Bell scenario. The observed variables A and B together with an unobserved system C are used to generate outputs X and Y, respectively. In the classical case, C is modelled as a random variable, in the quantum case it is a quantum state on a Hilbert space . (c) Triangle causal structure. Three observed random variables X, Y and Z share pairwise common causes A, B and C, which in the classical case are modelled by random variables. Some of the valid inequalities such as 2I(X:Y |Z)+I(X:Z|Y)+I(Y :Z|X)−I(X:Y)≥0 can only be recovered using non-Shannon entropic inequalities [50].
(i) Classical causal structures
In the classical case, the causal relations among a set of random variables can be explored by means of the theory of Bayesian networks (see, for instance, [1,2] for a complete presentation of this theory).
A classical causal structure, CC, is a causal structure in which each node of the DAG has an associated random variable.Definition 2.5.
It is common to use the same label for the node and its associated random variable. The DAG encodes which joint distributions of the involved variables are allowed in a causal structure CC. To explain this, we need a little more terminology.
Let XS, XT, XU be three disjoint sets of jointly distributed random variables. Then XS and XT are said to be conditionally independent given XU if and only if their joint distribution PXSXTXU can be written as PXSXTXU= PXS | XUPXT | XUPXU. Conditional independence of XS and XT given XU is denoted as XS ⫫ XT | XU.Definition 2.6.
Two variables XS and XT are (unconditionally) independent if PXSXT=PXSPXT, concisely written XS ⫫ XT. With reference to a DAG with a subset of nodes, X, we will use X↓ to denote the ancestors of X and X↑ to denote the descendants of X. The parents of X are represented by X↓1 and the non-descendants are X⤉ .
Let CC be a classical causal structure with nodes {X1,X2,…,Xn}. A probability distribution is (Markov) compatible with CC if it can be decomposed as .Definition 2.7.
The compatibility constraint encodes all conditional independences of the random variables in the causal structure CC. Nonetheless, whether a particular set of variables is conditionally independent of another is more easily read from the DAG, as explained in the following.
Let X, Y and Z be three pairwise disjoint sets of nodes in a DAG G. The sets X and Y are said to be d-separated by Z, if Z blocks any path from any node in X to any node in Y . A path is blocked by Z if the path contains one of the following: or for some nodes i, j and a node z∈Z in that path, or if the path contains , where k∉Z.Definition 2.8.
The d-separation of the nodes in a causal structure is directly related to the conditional independence of its variables. The following proposition corresponds to theorem 1.2.5 from [1], previously introduced in [51,52]. It justifies the application of d-separation as a means to identify independent variables.
Let CC be a classical causal structure and let X, Y and Z be pairwise disjoint subsets of nodes in CC. If a probability distribution P is compatible with CC, then the d-separation of X and Y by Z implies the conditional independence X ⫫ Y | Z. Conversely, if, for every distribution P compatible with CC, the conditional independence X ⫫ Y | Z holds, then X is d-separated from Y by Z.Proposition 2.9 (Verma & Pearl).
The compatibility of probability distributions with a classical causal structure is conveniently determined with the following proposition, which has also been called the parental or local Markov condition before (theorem 1.2.7 in [1]).
Let CC be a classical causal structure. A probability distribution P is compatible with CC if and only if every variable in CC is independent of its non-descendants, conditioned on its parents.Proposition 2.10 (Pearl).
Hence, to establish whether a probability distribution is compatible with a certain classical causal structure, it is enough to check that every variable X is independent of its non-descendants X⤉ given its parents X↓1, concisely written as X ⫫ X⤉ | X↓1, i.e. to check one constraint for each variable. In particular, it is not necessary to explicitly check for all possible sets of nodes whether they obey the independence relations implied by d-separation. Each such constraint can be conveniently expressed as
While the conditional independence relations capture some features of the causal structure, they are insufficient to completely capture the causal relations between variables, as illustrated in figure 2. In this case, the probability distributions themselves are unable to capture the difference between these causal structures: correlations are insufficient to determine causal links between random variables. External interventions allow for the exploration of causal links beyond the conditional independences [1]. However, we do not consider these here.
Figure 2. While in the left causal structure X ⫫ Y , the other three networks share the conditional independence relation X ⫫ Y | Z. This illustrates that the conditional independences are not sufficient to characterize the causal links among a set of random variables.
Let CC be a classical causal structure involving n random variables {X1,X2,…,Xn}. The restricted set of distributions that are compatible with the causal structure CC is .
The classical instrumental scenario of figure 1a allows for any four-variable distribution in the set Example 2.11 (Allowed distributions in the instrumental scenario).
The restrictions on the allowed distributions also restrict the corresponding entropy cones. Owing to proposition 2.10 there are at most n independent conditional independence equalities (2.1) in a causal structure CC. Their coefficients can be concisely written in terms of a matrix MCI(CC), where CI stands for conditional independence. For a causal structure CC, we define the two sets and Γ(CC):={v∈Γn|MCI(CC)⋅v=0}, where Γ*(CC)⊆Γ(CC). The following lemma justifies the notation we use for Γ*(CC); it is the set of achievable entropy vectors in CC.
For a causal structure CC, . Furthermore, its topological closure, is a convex cone.Lemma 2.12.
For the causal structure CC, let be the set of all entropy vectors. As (2.1) holds for each variable Xi if and only if (cf. proposition 2.10), . Applying the definition of MCI(CC) yields E(CC)=Γ*(CC). Now, let us consider the set . This is a closed convex set, because is known to be closed and convex [36] and because restricting the closed convex cone with linear equality constraints retains these properties. More precisely, the set of solutions to the matrix equality MCI(CC)⋅v=0 is also closed and convex. Being the intersection of two closed convex sets, the set F(CC) is also closed and convex. From this we conclude that is convex because equals F(CC). (Because F(CC) is closed, any element w∈F(CC), in particular any element on its boundary, is the limit of a sequence of elements {wk}k for , where the wk lie in the interior of F(CC) for all k. Hence .) □Proof.
The convexity of is crucial for the considerations of the following sections. Note that, in spite of the convexity of , the set is generally not convex. This alludes to the fact that significant information about the achievable correlations among the random variables is lost via the mapping from to the corresponding entropic cone .
The instrumental scenario has at most four independent conditional independence equalities (2.1). We find that there are only two, I(A:X)=0 and I(Y :X|AZ)=0. This yields with
Example 2.13 (Entropic outer approximation for the instrumental scenario).
In general, the outer approximation to can be further tightened by taking non-Shannon inequalities into account. These have lead to the derivation of numerous new entropic inequalities for various causal structures [50] (see e.g. the triangle causal structure of figure 1c). For the instrumental scenario, however, such additional inequalities are irrelevant. This can, for instance, be seen by constructing the following inner approximation to the cone.
] For the instrumental scenario an inner approximation is given in terms of the Ingleton cone and the conditional independence constraints from the previous example, ΓI(ICC)={v∈ΓI|MCI(ICC)⋅v=0}. For this causal structure the Ingleton constraints are implied by the Shannon inequalities and the conditional independence constraints and, hence, inner and outer approximation coincide. Consequently, they also coincide with the actual entropy cone, i.e. . In particular, non-Shannon entropic inequalities cannot improve the outer approximation in this example.Example 2.14 (Entropic inner approximation for the instrumental scenario [50]).
Inner approximations have been considered in [50]. They are particularly useful in cases where identical inner and outer approximations are found, where they identify the actual boundary of the entropy cone. In other cases, they can allow parts of the actual boundary to be identified or give clues on how to find better outer approximations.
Arguably all interesting scenarios (such as the previous example) involve unobserved variables that are suspected to cause some of the correlations between the variables we observe. These unobserved variables may yield constraints on the possible joint distributions of the observed variables, a well-known example being a Bell inequality [30] (for a detailed discussion of the significance of Bell inequality violation on classical causal structures see [31]). More generally, we would like to infer constraints on the observed variables that follow from the presence of unobserved variables.
For a causal structure on n random variables {X1,X2,…,Xn}, the restriction to the set of observed variables is called its marginal scenario, denoted by . Here, we assume w.l.o.g. that the first k≤n variables are observed and the remaining n−k are not. We are thus interested in the correlations among the first k variables that can be obtained as the marginal of some distribution over all n variables. Without any causal restrictions the set of all probability distributions of the k observed variables is i.e. . For a classical causal structure CC on the set of variables {X1,X2,…,Xn}, marginalizing all distributions over the n−k unobserved variables leads to the set . In contrast with the unrestricted case, this set of distributions is, in general, not recovered by considering a causal structure that involves only k observed random variables, as can be seen in the following example.
For the instrumental scenario, the observed variables are X, Y and Z and their joint distribution is of the form .Example 2.15 (Observed distributions in the instrumental scenario).
The first entropic inequalities for a marginal scenario were derived in [12], where certificates for the existence of common ancestors of a subset of the observed random variables of at least a certain size were given. One such scenario is the triangle causal structure of figure 1c. The systematic entropy vector approach was devised for classical causal structures in [6,8,10]. An outer approximation to the entropic cones of a variety of causal structures was given in [11]. In the following we give the details of this approach.
In the entropic picture, marginalization is performed by eliminating the coordinates that represent entropies of sets of variables containing at least one unobserved variable from the vectors. This corresponds to a projection of a cone in to its marginal cone in [9]. We will denote this projection . It gives all entropy vectors w of the observed sets of variables, i.e. of the marginal scenario , for which there exists at least one entropy vector v in the original scenario with matching entropies on the observed variables.
Starting from the set of all entropy vectors, Γ*(CC), those relevant for the marginal scenario can be obtained by discarding the appropriate components. For a finitely generated cone such as Γ(CC), its projection can be more efficiently determined from the projection of its extremal rays. In the dual description of the entropic cone in terms of its facets (i.e. its inequality description), the transition to the marginal scenario can be made computationally by eliminating all entropies of sets of variables not contained in from the system of inequalities. The standard algorithm that achieves this is Fourier–Motzkin elimination [53], which has been used in this context in [6,8,9].
Without any causal restrictions, the entropy cone is projected to the marginal cone . Note that if we marginalize over n−k variables, we recover the entropy cone for k random variables, i.e. . The same applies to the outer approximations: The n variable Shannon cone Γn is projected to the k variable Shannon cone with the mapping , i.e. . This follows because the n variable Shannon constraints contain the corresponding k variable constraints as a subset, and because any vector in Γk can be extended to a vector in Γn, for instance, by taking H(Xk+1)=H(Xk+2)=⋯=H(Xn)=0 and H(XS∪XT)=H(XS) for any XT⊆{Xk+1,Xk+2,…,Xn}.
For a classical causal structure CC, we will be interested in the set , which is by construction a convex cone, because projection preserves convexity. The following lemma confirms that this is the entropy cone of the marginal scenario and thus also formally justifies the method of projecting the sets directly.
is equal to the set of entropy vectors compatible with the marginal scenario of the classical causal structure CC, i.e. .Lemma 2.16.
Let denote the set on the rhs of the statement in the lemma. Note that implies that there exists v∈Γ*(CC) s.t. . Using lemma 2.12, we have v=H(P) for some . If we take then w=H(P′) and hence . Conversely, implies that there exists s.t. w=H(P′) and, hence, there exists such that . If we take v=H(P), then and hence . Taking the topological closure of both sets concludes the proof. ▪Proof.
An outer approximation to is , which can be written as , where is the matrix obtained via Fourier-Motzkin elimination and encodes the set of inequalities on that are implied by the fact that MCI(CC)⋅v=0 and MnSH⋅v≥0 hold on (except for the k-variable Shannon constraints which are already included in ).
] For the instrumental scenario, the outer approximation to its marginal cone is found by projecting Γ(ICC) to its three variable scenario and yields , where corresponds to the inequality I(X:Y Z)≤H(Z) from [10,11]. As holds, we also have .Example 2.17 (Entropic outer approximation for the marginal cone of the instrumental scenario [10,11]).
As mentioned previously, non-Shannon inequalities cannot give any new entropic constraints for IC, as the Shannon approximation is already tight. However, in many causal structures they do. For instance in the triangle scenario of figure 1c, non-Shannon inequalities still lead to new entropic constraints, even after marginalization to the three observed variables [50].
(ii) Causal structures with unobserved quantum systems
A quantum causal structure differs from its classical counterpart in that unobserved systems correspond to shared quantum states.
A quantum causal structure, CQ, is a causal structure where each observed node has a corresponding random variable, and each unobserved node has an associated quantum system.Definition 2.18.
In a classical causal structure, the edges of the DAG represent the propagation of classical information, and, at a node with incoming edges, the random variable there can be generated by applying an arbitrary function to its parents. We are hence implicitly assuming that all the information about the parents is transmitted to its children (otherwise the set of allowed functions would be restricted). This does not pose a problem because classical information can be copied. In the quantum case, on the other hand, the no-cloning theorem means that the children of a node cannot (in general) all have access to the same information as is present at that node. Furthermore, the analogue of performing arbitrary functions in the classical case is replaced by arbitrary quantum operations. Such a quantum framework that allows for an analysis with entropy vectors was introduced in [13]. In the following we outline this approach. However, for unity of description, our account of quantum causal structures is based upon the viewpoint that is taken for generalized causal structures in [11], which we review in the next section. (The difference is as follows: In [13] nodes correspond to quantum systems. All outgoing edges of a node together define a completely positive trace-preserving (CPTP) map with output states corresponding to the joint state associated with its child nodes. Similarly, the CPTP map associated to the input edges of a node must map the states of the parent nodes to the node in question. In [11], on the other hand, edges correspond to states, whereas the transformations occur at the nodes.)
Let CQ be a quantum causal structure. Nodes without input edges correspond to the preparation of a quantum state described by a density operator on a Hilbert space, e.g. for a node A, where for observed nodes this state is required to be classical. By (, we denote the set of all density operators on a Hilbert space .) For each directed edge in the graph there is a corresponding subsystem with Hilbert space labelled by the edge’s input and output nodes. For instance, if Y and Z are the only children of A, then there are associated spaces and such that (in the classical case these subsystems may all be taken to be copies of the system itself). At an unobserved node, a CPTP map from the joint state of all its input edges to the joint state of its output edges is performed. A node is labelled by its output state. For an observed node the latter is classical. Hence, it corresponds to a random variable that represents the output statistics obtained in a measurement by applying a positive operator-valued measure (POVM) to the input states. (Note that preparation and measurement can also be seen as CPTP maps with classical input and output systems, respectively, thus allowing for a unified formulation.) If all input edges are classical, this can be interpreted as a stochastic map between random variables.
A distribution, P, over the observed nodes of a causal structure CQ is compatible with CQ if there exists a quantum state labelling each unobserved node (with subsystems for each unobserved edge) and transformations, i.e. preparations and CPTP maps for each unobserved node as well as POVMs for each observed node, that allow for the generation of P by means of the Born rule. We denote the set of all compatible distributions .
For the quantum instrumental scenario (figure 1a), is the set of compatible distributions. A state is prepared. Depending on the random variable X, a POVM on is applied to generate the output distribution of the observed variable Z. Depending on the latter, another POVM is applied to generate the distribution of Y .Example 2.19 (Compatible distributions in the quantum instrumental scenario).
The set of entropy vectors of compatible probability distributions over the observed nodes, , is . Outer approximations were first derived in [13], a procedure that we outline in the following. For their construction, an entropy is associated to each random variable and to each subsystem of a quantum state (equivalently each edge originating at a quantum node), corresponding to the von Neumann entropy of the respective system. For convenience of exposition, edges and their associated systems share the same label. The von Neumann entropy of a density operator is defined as
Because of the impossibility of cloning, the outcomes and the quantum systems that led to them do not exist simultaneously. Therefore, there is in general no joint multiparty quantum state for all subsystems and it does not make sense to talk about the joint entropy of the states and outcomes. More concretely, if a system A is measured to produce Z, then ρAZ is not defined and hence neither is H(AZ) (for attempts to circumvent this, see, for example, [54]).
Two subsystems in a quantum causal structure CQ coexist if neither of them is a quantum ancestor of the other. A set of subsystems that mutually coexist is termed coexisting.Definition 2.20.
A quantum causal structure may have several maximal coexisting subsets. Only within such subsets is there a well-defined joint quantum state and joint entropy.
Consider the quantum version of the instrumental scenario, as illustrated in figure 1a. There are three observed variables as well as two edges originating at unobserved (quantum) nodes, hence 5 variables to consider. More precisely, the quantum node A has two associated subsystems AZ and AY. The correlations seen at the two observed nodes Z and Y are formed by measurement on the respective subsystems AZ and AY. The coexisting sets in this causal structure are {AY,AZ,X}, {AY,X,Z} and {X,Y,Z} and their (non-empty) proper subsets.Example 2.21 (Coexisting sets in the quantum instrumental scenario).
Note that, without loss of generality, we can assume that any initial, i.e. parentless quantum states, such as ρA above, are pure. This is because any mixed state can be purified, and if the transformations and measurement operators are then taken to act trivially on the purifying systems, the same statistics are observed. In the causal structure of example 2.21, this implies that ρA can be considered to be pure and thus H(AYAZ)=0. The Schmidt decomposition then implies that H(AY)=H(AZ). This is computationally useful as it reduces the number of free parameters in the entropic description of the scenario. Furthermore, by Stinespring’s theorem [55], whenever a CPTP map is applied at a node that has at least one quantum child, then one can instead consider an isometry to a larger output system. The additional system that is required for this can be taken to be part of the unobserved quantum output (or one of them in case of several quantum output nodes). Each such case allows for the reduction of the number of variables by one, because the joint entropy of all inputs to such a node must be equal to that of all its outputs.
Quantum states are known to obey submodularity [56] and also obey the following condition:
— Weak monotonicity [56]: H(XS∖XT)+H(XT∖XS)≤H(XS)+H(XT), for all XS, XT⊆Ω (recall H({})=0).
This is the dual of submodularity in the sense that the two inequalities can be derived from each other by considering purifications of the corresponding quantum states [57].
Within the context of causal structures, these relations can always be applied between variables in the same coexisting set. In addition, whenever it is impossible for there to be entanglement between the subsystems XS∩XT and XS∖XT—for instance, if these subsystems are in a cq-state—the monotonicity constraint H(XS∖XT)≤H(XS) holds. If it is also impossible for there to be entanglement between XS∩XT and XT∖XS, then the monotonicity relation H(XT∖XS)≤H(XT) holds, rendering the weak monotonicity relation stated above redundant.
Altogether, these considerations lead to a set of basic inequalities containing some Shannon and some weak-monotonicity inequalities, which are conveniently expressed in a matrix MB(CQ). This way of approximating the entropic cone in the quantum case is inspired by work on the entropic cone for multiparty quantum states [34]. Note also that there are no further inequalities for the von Neumann entropy known to date (contrary to the classical case where a variety of non-Shannon inequalities is known), except under additional constraints [58–63].
The conditional independence constraints in CQ cannot be identified by proposition 2.10, because variables do not coexist with any quantum parents and hence conditioning a variable on a quantum parent is not meaningful. Nonetheless, among the variables in a coexisting set the conditional independences that are valid for CC also hold in CQ. This can be seen as follows. First, the validity of any constraints that involve only observed variables (which are always part of a coexisting set) hold by proposition 2.27 below. Secondly, for unobserved systems only their classical ancestors and none of their descendants can be part of the same coexisting set. An unobserved system is hence independent of any subset of the same coexisting set with which it shares no ancestors. Note that each of the subsystems associated with a quantum node is considered to be a parent of all of the node’s children (see figure 1 for an example).
In addition, suppose that XS and XT are disjoint subsets of a coexisting set, Ξ, and that the unobserved system A is also in Ξ. Then I(A:XS | XT)=0 if XT d-separates A from XS (in the full graph including quantum nodes). This follows because any quantum states generated from the classical separating variables may be obtained by first producing random variables from the latter (for which the usual d-separation rules hold) and then using these to generate the quantum states in question (potentially after generating other variables in the network), hence retaining conditional independence. The same considerations can be made for sets of unobserved systems. These independence constraints may be assembled in a matrix MQCI(CQ).
Among the variables that do not coexist, some are obtained from others by means of quantum operations. These variables are thus related by data processing inequalities (DPIs) [64].
Let and be a completely positive trace-preserving (CPTP) map on leading to a state ρ′XSXT. Then I(XS:XT)ρ′XSXT≤I(XS:XT)ρXSXT.Proposition 2.22 (DPI).
Remarks: (1) the map from a quantum state to the diagonal state with entries equal to the outcome probabilities of a measurement is a CPTP map and hence also obeys the DPI. (2) In general, can be a map between operators on different Hilbert spaces, i.e. . However, as we can consider these operators to act on the same larger Hilbert space, we can w.l.o.g. take to be a map on this larger space, which we call ; (3) There are also DPIs for conditional mutual information, e.g. I(A:B | C)ρ′ABC≤I(A:B | C)ρABC for , but these are implied by proposition 2.22, so they need not be treated separately here.
The DPI provide an additional set of entropic constraints, which can be expressed in terms of a matrix inequality MDPI(CQ)⋅v≥0. In general, there are a large number of variables for which DPI hold. It is thus beneficial to derive rules that specify which of the inequalities are needed. First, note that whenever a concatenation of two CPTP maps and , , is applied to a state, then any DPIs for inputs and outputs of are implied by the DPIs for and . This follows by deriving the DPIs for input and output states of and , respectively, and combining the two. Hence, the DPIs for composed maps never have to be considered as separate constraints.
Secondly, whenever a state can be decomposed as ρXSXTXR=ρXSXT⊗ρXR and a CPTP map transforms the state on then any DPIs for ρXSXTXR are implied by the DPIs for ρXSXT. This follows from I(XS:XTXR)=I(XS:XT), I(XSXR:XT)=I(XS:XT) and I(XSXT:XR)=0.
Furthermore, whenever a node has classical and quantum inputs, there is not only a CPTP map generating its output state, but this map also can be extended to a CPTP map that simultaneously retains the classical inputs, as is the content of the following lemma, which also shows that retaining a copy of the classical inputs leads to tighter entropic inequalities.
Let Y be a node with classical and quantum inputs XC and XQ and be a CPTP map that acts at this node, i.e. is a map from to . Then can be extended to a map such that with the property that ρ′XCY is classical on and ρ′XC=ρXC. Furthermore, the DPIs for imply those for .Lemma 2.23.
The first part of the lemma follows because classical information can be copied, and hence can be decomposed into first copying XC, and then performing . (Alternatively, we can think of as the concatenation of with a partial trace; this allows us to use the same output state ρ′ for both maps in the argument below.) Suppose . The second part follows because if I(XCXQXS:XT)ρ≥I(Y XS:XT)ρ′ is a valid DPI for then I(XCXQXS:XT)ρ≥I(XCY XS:XT)ρ′ is valid for . The second of these implies the first by the submodularity relation I(XCY XS:XT)ρ′≥ I(Y XS:XT)ρ′. ▪Proof.
All the above (in)equalities are necessary conditions for a vector to be an entropy vector compatible with the causal structure CQ. They constrain a polyhedral cone in , where m is the total number of coexisting sets of CQ,
The cone involves the matrix MB(ICQ) that features 29 (independent) inequalities (note that the only weak monotonicity relations that are not made redundant by other basic inequalities are H(AY|AZX)+H(AY)≥0, H(AZ|AYX)+H(AZ)≥0, H(AY|AZ)+H(AY|X)≥0 and H(AZ|AY)+H(AZ|X)≥0). In this case, a single independence constraint encodes that X is independent of AYAZ:
Example 2.24 (Entropic constraints for the quantum instrumental scenario).
From Γ(CQ), an outer approximation to the set of compatible entropy vectors of the observed scenario of CQ can be obtained using Fourier-Motzkin elimination. This leads to , which can be written as . The matrix encodes all (in)equalities on implied by MB(CQ)⋅v≥0, MQCI(CQ)⋅v=0 and MDPI(CQ)⋅v≥0 (except for the Shannon inequalities, which are already included in ). Note that , where the first relation holds because all inequalities relevant for quantum states also hold in the classical case. (This can be seen by thinking of a classical source as made up of two or more (perfectly correlated) random variables as its subsystems, which are sent to its children and processed there. The Shannon inequalities hold among all of these variables (and also imply any weak monotonicity constraints). The classical independence relations include the quantum ones but may add constraints that involve conditioning on any of the variables’ ancestors. These (in)equalities are tighter than the DPIs, which are hence not explicitly considered in the classical case.)
The projection of Γ(ICQ) leads to the entropic cone , for which equals from example 2.17, thus corresponding to the constraint I(X:Y Z)≤H(Z). Hence, coincides with [50].Example 2.25 (Entropic outer approximation for the quantum instrumental scenario).
This method has been applied to find an outer approximation to the entropy cone of the triangle causal structure in the quantum case (cf. figure 1c) [13]. This approximation did not coincide with the outer approximation to the classical triangle scenario obtained from Shannon inequalities and independence constraints. Whether there are more as yet unknown inequalities in the quantum case remains an open question (as opposed to the classical case where even better outer approximations have already been found [50]). In [13], the method was furthermore combined with the approach reviewed in §3b, where it was applied to a scenario related to IC (cf. example 3.8 below).
(iii) Causal structures with unobserved systems in other non-signalling constraints
The concept of a generalized causal structure was introduced in [11], the idea being to have one framework in which classical, quantum and even more general systems, for instance non-local boxes [65,66], can be shared by unobserved nodes and where theory-independent features of networks and corresponding bounds on our observations may be identified.
A generalized causal structure CG is a causal structure which, for each observed node, has an associated random variable and, for each unobserved node, has a corresponding non-signalling resource allowed by a generalized probabilistic theory.Definition 2.26.
Classical and quantum causal structures can be viewed as special cases of generalized causal structures [11,67]. Generalized probabilistic theories may be conveniently described in the operational–probabilistic framework of [68]. Circuit elements correspond to so-called tests that are connected by wires, which represent propagating systems. In general, such a test has an input system, and two outputs: an output system and an outcome. In the case of a system with trivial input, this corresponds to a preparation test, and in the case of trivial output this is an observation test. In the causal structure framework, a test is associated to each node. However, each such test has only one output: for unobserved nodes this is a general resource state; for observed nodes it is a random variable. Furthermore, resource states do not allow for signalling from the future to the past, i.e. we are considering so-called causal operational–probabilistic theories. This is important for the interpretation of generalized causal structures.
A distribution P over the observed nodes of a generalized causal structure CG is compatible with CG if there exists a causal operational–probabilistic theory, a resource for each unobserved edge in that theory and transformations for each node that allow for the generation of P. We denote the set of all compatible distributions . As in the quantum case, there is no notion of a joint state of all nodes in the causal structure and of conditioning on an unobserved system. Even more, there is no consensus on the representation of states and their dynamics in general non-signalling theories. To circumvent this, the classical notion of d-separation has been reformulated [11], which enables the following proposition.
Let CG be a generalized causal structure, and let X, Y and Z be pairwise disjoint subsets of observed nodes in CG. If a probability distribution P is compatible with CG, then the d-separation of X and Y by Z implies the conditional independence X ⫫ Y | Z. Conversely, if, for every distribution P compatible with CG, the conditional independence X ⫫ Y | Z holds, then X is d-separated from Y by Z in CG.Proposition 2.27 (Henson, Lal & Pusey).
This allows for the derivation of conditional independence relations among observed variables that hold in any generalized probabilistic theory, which hence restrict a general entropic cone. Furthermore, it rigorously justifies retaining the independence constraints among the (observed) variables in coexisting sets in quantum causal structures (cf. §2bii), which can be seen as special cases of generalized causal structures.
In [11], sufficient conditions for identifying causal structures C for which, in the classical case, CC, there are no restrictions on the distribution over observed variables other than those that follow from the d-separation of these variables were derived. As, by proposition 2.27, these conditions also hold in CQ and CG, this implies . For causal structures with up to six nodes, there are 21 cases (and some that can be reduced to the these 21) where such equivalence does not hold and where further relations among the observed variables have to be taken into account [11,18].
Outer approximations to the entropic cones for causal structures, CG, based on the observed variables and their independences only were derived in [11]. Moreover, a few additional constraints for certain generalized causal structures were derived there. For example, the entropic constraint I(X:Y)+I(X:Z)≤H(X) for the triangle causal structure of figure 1c (which had previously been established in the classical case [69]) was found. This constraint does not follow from the observed independences, but nonetheless holds for the triangle causal structure in generalized probabilistic theories.
In spite of this, a systematic entropic procedure, in which the unobserved variables are explicitly modelled and then eliminated from the description, is not available for generalized causal structures. The issue is that we are lacking a generalization of the Shannon and von Neumann entropy to generalized probabilistic theories that obeys submodularity and for which the conditional entropy can be written as the difference of unconditional entropies [70,71].
One possible generalized entropy is the measurement entropy, which is positive and obeys some of the submodularity constraints (those with XS∩XT={}) but not all [70,71]. Using this, [72] considered the set of possible entropy vectors for a bipartite state in box world, a generalized probabilistic theory that permits all bipartite correlations that are non-signalling [73]. They found no further constraints on the set of possible entropy vectors in this setting (hence, contrary to the quantum case, measurement entropy vectors of separable states in box world can violate monotonicity). Other generalized probabilistic theories and multiparty states have, to our knowledge, not been similarly analysed.
(iv) Other directions for exploring quantum and generalized causal structures
The approaches to quantum and generalized causal structures above are based on adaptations of the theory of Bayesian networks to the respective settings and on retaining the features that remain valid, for instance, the relation between d-separation and independence for observed variables [11] (cf. §2biii). Other approaches to generalize classical networks to the quantum realm have been pursued [54], where a definition of conditional quantum states analogous to conditional probability distributions was formulated.
Recent articles have proposed generalizations of Reichenbach’s principle [74] to the quantum realm [16,75,76]. In [16], a graph separation rule, q-separation, was introduced, whereas [75,76] rely on a formulation of quantum networks in terms of quantum channels and their Choi states.
An active area of research is the exploration of frameworks that allow for indefinite causal structures [77–79]. There are several approaches achieving this, such as the process matrix formalism [80], which has led to the derivation of so-called causal inequalities and the identification of signalling correlations that are achievable in this framework, however, not with any predefined causal structure [80,81]. Another framework that is able to describe such scenarios is the theory of quantum combs [82], illustrated by a quantum switch, a quantum bit controlling the circuit structure in a quantum computation. A recent framework with the aim to model cryptographic protocols is also available [83]. Some initial results on the analysis of indefinite causal structures with entropy have recently appeared [84].
In the classical, quantum and generalized causal structures considered above only the observed classical information can be transmitted via a link between two observed variables and, in particular, no additional unobserved system. This understanding of the causal links encodes a Markov condition. In other situations, it can be convenient for the links in the graph to represent a notion of future instead of direct causation, see e.g. [85,86].
3. Entropy vector approach with post-selection
A technique that leads to additional, more fine-grained inequalities is based on post-selecting on the values of parentless classical variables. This technique was pioneered by Braunstein & Caves [87] and has been used to systematically derive numerous entropic inequalities [6–8,17,18].
(a) Post-selection in classical causal structures
In the following, we denote a random variable X post-selected on the event of another random variable, Y , taking a particular value, Y =y, as X|Y =y. The same notation is used for a set of random variables S={X1,X2,…,Xn}, whose joint distribution is conditioned on Y =y, S|Y =y={X1|Y =y,X2|Y =y,…,Xn|Y =y}. The following lemma can be understood as a generalization of (a part of) Fine’s theorem [88,89].
Let CC be a classical causal structure with a parentless observed node X that takes values X=1,2,…,n, and let P be a joint distribution over all random variables Ω=X∪X↑∪X⤉ in CC (with P compatible with CC). Then, there exists a joint distribution Q over the n⋅|X↑|+|X⤉ | random variables such that for all x∈{1,…,n}.Lemma 3.1.
The joint distribution over the random variables X↑∪X⤉ in CC can be written as Now, take |X⤉ X=x)P(X⤉). As required, this distribution has marginals . ▪Proof.
It is perhaps easiest to think about this lemma in terms of a new causal structure on Ω|X that is related to the original. Roughly speaking, the new causal structure is formed by removing X and replacing the descendants of X with several copies each of which have the same causal relations as in the original causal structure (with no mixing between copies). More precisely, if X is a parentless node in CC, we can form a post-selected causal structure on Ω|X (post-selecting on X) as follows: (i) For each pair of nodes A, B∈X⤉ in CC, make A a parent of B in iff A is a parent of B in CC. (ii) For each node B∈X⤉ in CC and for each node A|X=x, make B a parent of A|X=x in iff B is a parent of A in CC. (iii) For each pair of nodes, A|X=x and B|X=x, make B|X=x a parent of A|X=x in iff B is a parent of A in CC. (Note that there is no mixing between different values of X=x.) see figures 3 and 5 and example 3.3 for illustrations. This view gives us the following corollary of lemma 3.1, which is an alternative generalization of Fine’s theorem.
Figure 3. (a) Pearl’s instrumental scenario post-selected on binary X. The causal structure is obtained from the IC by removing X and replacing Y and Z with copies, each of which has the same causal relations as in the original causal structure. (b) Post-selected Bell scenario with binary inputs A and B.
Let CC be a classical causal structure with a parentless observed node X that takes values X=1,2,…,n, and let P be a joint distribution over all random variables X∪X↑∪X⤉ in CC (with P compatible with CC). Then, there exists a joint distribution Q compatible with the post-selected causal structure CCX such that for all x∈{1,…,n}.Lemma 3.2.
The distributions that are of interest in this new causal structure are the marginals for all x (and their interrelations), as they correspond to distributions in the original scenario. Any constraints on these distributions derived in the post-selected scenario are by construction valid for the (post-selected) distributions compatible with the original causal structure.
Consider the causal structure IC where the parentless variable X takes values 0 or 1. For any P compatible with ICC, there exists a distribution Q compatible with the post-selected causal structure (figure 3a) such that Q(Z|X=0Y |X=0A)= P(ZY |AX=0)P(A) and Q(Z|X=1Y |X=1A)=P(ZY |AX=1)P(A). These marginals and their relations are of interest for the original scenario.Example 3.3 (Post-selection in the instrumental scenario).
Note that the above reasoning may be applied recursively. Indeed, the causal structure with variables Ω|X may be post-selected on the values of one of its parentless nodes. The joint distributions of the nodes Ω|X and the associated causal structure may be analysed in terms of entropies, as illustrated with the following example.
] In the Bell scenario with binary inputs A and B (figure 1b), lemma 3.2 may be applied first to post-select on the values of A and then of B. This leads to a distribution Q compatible with the post-selected causal structure (on A and B) shown in figure 3b, for which Q(X|A=aY |B=b)=P(XY |A=a,B=b) for a,b∈{0,1} (in this case the joint distribution is already known to exist by Fine’s theorem [88,89]). Applying the entropy vector method to the post-selected causal structure and marginalizing to vectors of form (H(X|A=0), H(X|A=1), H(Y |B=0), H(Y |B=1), H(X|A=0Y |B=0), H(X|A=0Y |B=1), H(X|A=1Y |B=0), H(X|A=1Y |B=1)) yields the inequality H(Y 1 | X1)+H(X1 | Y 0)+ H(X0 | Y 1)−H(X0 | Y 0)≥0 and its permutations [6,87]. Whenever the input nodes take more than two values, the latter may be partitioned into two sets, guaranteeing applicability of these inequalities. Furthermore, Chaves [7] showed that these inequalities are sufficient for detecting any behaviour that is not classically reproducible in the Bell scenario where the two parties perform measurements with binary outputs.Example 3.4 (Entropic constraints for the post-selected Bell scenario [85]).
The extension of Fine’s theorem to more general Bell scenarios [90,91], i.e. to scenarios involving a number of space-like separated parties that each choose input values and produce some output random variable (and scenarios that can be reduced to the latter), has been combined with the entropy vector method in [6,8].
Entropic constraints that are derived in this way provide novel and non-trivial entropic inequalities for the distributions compatible with the original classical causal structure. This idea was used in [8] to analyse the so-called n-cycle scenario, which is of particular interest in the context of non-contextuality and includes the Bell scenario (with binary inputs and outputs) as a special case. (A full probabilistic characterization of the n-cycle scenario was given in [92].)
In [6], new entropic inequalities for the bilocality scenario, which is relevant for entanglement swapping [93,94], as well as quantum violations of the classical constraints on the 4- and 5-cycle scenarios were derived. For the n-cycle scenario, the (polynomial number of) entropic inequalities are sufficient for the detection of any non-local distributions [7] (just as the exponential number of inequalities in the probabilistic case [92]). In the following we illustrate the method of [6,8] with a continuation of example 3.3.
The entropy vector method from §2 is applied to the 5-variable causal structure of figure 3a. The marginalization is performed to retain all marginals that correspond to distributions in the original causal structure (figure 1a), i.e. any marginals of P(Y Z|X=0) and P(Y Z|X=1). Hence, the five-variable entropic cone is projected to a cone that restricts vectors of the form (H(Y |X=0),H(Y |X=1),H(Z|X=0),H(Z|X=1),H(Y |X=0Z|X=0),H(Y |X=1Z|X=1)). Note that entropies of unobserved marginals such as H(Y |X=0Z|X=1) are not included. With this technique, the Shannon constraints for the three components (H(Y |X=0),H(Z|X=0),H(Y |X=0Z|X=0)) are recovered (the same holds for X=1); no additional constraints arise here. It is interesting to compare this to the Bell scenario considered in example 3.4. In both causal structures any 4-variable distributions, PZ|X=0Z|X=1Y|X=0Y|X=1 and PX|A=0X|A=1Y|B=0Y|B=1, respectively, are achievable (the additional causal links in figure 3b do not affect the set of compatible distributions). However, the marginal entropy vector in the Bell scenario has more components, leading to additional constraints on the observed variables [6,87].Example 3.5 (Entropic approximation for the post-selected instrumental scenario).
In some cases, two different causal structures, C1 and C2, can yield the same set of distributions after marginalizing, a fact that has been further explored in [95]. When this occurs, either causal structure can be imposed when identifying the set of achievable marginal distributions in either scenario. If the constraints implied by the causal structure C1 are a subset of those implied by C2, then those of C2 can be used to compute improved outer approximations on the entropic cone for C1. Furthermore, valid independence constraints may speed up computations even if they do not lead to any new relations for the observed variables (Note that some care has to be taken when identifying valid constraints for scenarios with causal structure [95].). Similar considerations also yield a criterion for indistinguishability of causal structures in certain marginal scenarios—if C1 and C2 yield the same set of distributions after marginalizing, then they cannot be distinguished in that marginal scenario.
In examples like the above, where no new constraints follow from post-selection, it may be possible to introduce additional input variables in order to certify the presence of quantum nodes in a network. The new parentless nodes can then be used to apply lemma 3.1 and the above entropic techniques. Mathematically, introducing further nodes to a causal structure is always possible. However, this is only interesting if experimentally feasible, e.g. if an experimenter has control over certain observed nodes and is able to devise an experiment where he can change their inputs. In the instrumental scenario, this may be of interest.
In this scenario (figure 1a), a measurement on system AZ is performed depending on X (where, in the classical case, AZ can w.l.o.g. be taken to be a copy of the unobserved random variable A). Its outcome Z (in the classical case a function of A) is used to choose another measurement to be performed on AY to generate Y (classically another a copy of A). It may often be straightforward for an experimenter to choose between several measurements. In the causal structure, this corresponds to introducing an additional observed input S to the second measurement (with the values of S corresponding to different measurements on AY). Such an adaptation is displayed in figure 4a. (Note that, for ternary, S the outer approximation of the post-selected causal structure of figure 4d with Shannon inequalities does not lead to any interesting constraints (as opposed to the structure of figure 4e, which is analysed further in example 3.8).)
Figure 4. Variations of the instrumental scenario (a), (b) and (c). The causal structure (c) is relevant for the derivation of the information causality inequality where S takes n possible values. (d) and (e) are the causal structures that are effectively analysed when post-selecting on a ternary S in (a) and on a binary S in (c), respectively. Alternatively, it may be possible that the first measurement (on AZ) is chosen depending on a combination of different independent factors, which each correspond to a random variable Xi. For two variables X1 and X2 the corresponding causal structure is displayed in figure 4b. This is an example of a causal structure where non-Shannon inequalities among classical variables lead to a strictly tighter outer approximation in the classical and quantum case than the approximations derived using only Shannon and weak-monotonicity constraints (also if there is a causal link from X1 to X2) [50]. Taken together, these two adaptations yield the causal structure of figure 4c, relevant in the context of the principle of information causality [96] (see also example 3.8 below).Example 3.6 (Variations of the instrumental scenario).
A second approach that relies on very similar ideas (also justified by lemma 3.1) is taken in [18]. For a causal structure CC with nodes Ω=X∪X↑∪X⤉ , where X is a parentless node, conditioning the joint distribution over all nodes on a particular X=x retains the independences of CC. In particular, the conditioning does not affect the distribution of the X⤉ , i.e. P(X⤉ |X=x)=P(X⤉) for all x. The corresponding entropic constraints can be used to derive entropic inequalities without the detour over computing large entropic cones, which may be useful where the latter computations are infeasible. The constraints that are used in [18] are, however, a (diligently but somewhat arbitrarily chosen) subset of the constraints that would go into the entropic technique detailed earlier in this section for the full causal structure. Indeed, when the computations are feasible, applying the full entropy vector method to the corresponding post-selected causal structure gives a systematic way to derive constraints, which are in general strictly tighter (cf. example 3.7).
So far, the restricted technique has been used in [18] to derive the entropic inequality

Figure 5. Causal structures from example 3.7. Post-selecting on a binary observed variable C leads to the causal structure (d) in the case of structure (a), whereas both (b) and (c) lead to structure (e). In particular, this shows that the conditional techniques may yield the same results for different causal structures.
Applying the post-selection technique for a binary random variable C to the causal structure from figure 5a yields the effective causal structure 5d. The latter can be analysed with the above entropy vector method, which leads to a cone that is characterized by 14 extremal rays or equivalently in terms of 22 inequalities, both available in the Supplementary Material. The inequalities I(Z: X|C=1)≥0, I(Z:Y |C=0)≥0, I(X|C=1:Y |C=1|Z)≥0 and H(Z|X|C=0)≥I(X|C=1Z: Y |C=1), which are part of this description, imply (3.1) above. We are not aware of any quantum violations of these inequalities. Structures 5b and 5c both lead to the causal structure 5e upon post-selecting on a binary C. The latter causal structure turns out to be computationally harder to analyse with the entropy vector method and (working with existing variable elimination software on a desktop computer) we have not been able to perform the corresponding marginalization when taking all Shannon and independence constraints into account. Hence, the method outlined in [18] is a useful alternative here.Example 3.7.
(b) Post-selection in quantum and general non-signalling causal structures
In causal structures with quantum and more general non-signalling nodes, lemma 3.1 is not valid. For instance, Bell’s theorem can be recast as the statement that there are distributions compatible with the quantum Bell scenario for which there is no joint distribution of X|A=0, X|A=1, Y |B=0 and Y |B=1 in the post-selected causal structure (on A and B) that has the required marginals (in the sense of lemma 3.2).
Nonetheless, the post-selection technique has been generalized to such scenarios [13,17], i.e. it is still possible to post-select on parentless observed (and therefore classical) nodes taking specific values. In such scenarios, the observed variables can be thought of as obtained from the unobserved resources by means of measurements or tests. If a descendant of the variable that is post-selected on has quantum or general non-signalling nodes as parents, then the different instances of the latter node and of all its descendants do not coexist (even if they are observed, hence classical). This is because such observed variables are generated by measuring a quantum or other non-signalling system. Such a system is altered (or destroyed) in a measurement, hence does not allow for the simultaneous generation of different instances of its children due to the impossibility of cloning.
In the quantum case, this is reflected in the identification of the coexisting sets in the post-selected causal structure, as is illustrated with the following example. (Note that different instances of a variable after post-selection have to be seen as alternatives and not as simultaneous descendants of their parent node as the representation of the post-selected causal structure might suggest.)
] The communication scenario used to derive the principle of information causality [96] is based on the variation of the instrumental scenario displayed in figure 4c. It has been analysed with the entropy vector method in [13], an analysis that is presented in the following. Conditioning on values of the variable S is possible in the classical and quantum cases. However, whereas in the classical case the variables Y |S=s for different S share a joint distribution (cf. lemma 3.1), they do not coexist in the quantum case. For binary S, the coexisting sets are {X1,X2,AZ,AY}, {X1,X2,Z,AY}, {X1,X2,Z,Y |S=1} and {X1,X2,Z,Y |S=2}. The only independence constraint in the quantum case is that X1, X2 and AYAZ are mutually independent. Marginalizing until only entropies of {X1,Y |S=1}, {X2,Y |S=2}, {Z} and their subsets remain, yields only one non-trivial inequality, with n=2. (Note that in [13] they derived the more general inequality I(X1:Y |S=1)+I(X2:Y |S=2)≤H(Z)+I(X1:X2), where X1 and X2 are not assumed independent. Furthermore, this is also the only inequality found in the classical case when restricting to the same marginal scenario [17].) The same inequality was previously derived by Pawłowski et al. [96] for general n, where the choice of marginals was inspired by the communication task considered. Subsequently, another marginal scenario was considered in [13]—the one with coexisting sets {X1,X2,Z,Y |S=1}, {X1,X2,Z,Y |S=2} and all of their subsets—which led to additional inequalities.Example 3.8 (Information causality scenario in the quantum case [13]).
Similar considerations were applied to causal structures allowing for general non-signalling resources, CG in [17]. Let be the disjoint union of its observed nodes, where X↑O are the observed descendants and X⤉ O the observed non-descendants of X. If the variable X takes values x∈{1,2,…,n}, this leads to a joint distribution of for each X=x, i.e. there is a joint distribution for for all x, denoted by . Because X does not affect the distribution of the independent variables X⤉ O , the distributions have coinciding marginals on X⤉ O , i.e. for all x, where s runs over the alphabet of X↑O. This encodes no-signalling constraints. There may be other constraints that arise from no-signalling. For instance, example 3.9 below suggests further constraints for each are implied by requiring non-signalling resources. The latter have to be found and added to the description separately.
In terms of entropy, there are n entropic cones, one for each (which each encode the independences among the observed variables). According to the above, they are required to coincide on the entropies for X⤉ O and on all of its subsets. These constraints define a convex polyhedral cone that is an outer approximation to the set of all entropy vectors achievable in the causal structure. Whenever the distributions involve fewer than three variables and assuming that all constraints implied by the causal structure and no-signalling have been taken into account, this approximation is tight because . Note that it may not always be obvious how to identify all relevant constraints (cf. the conjectured constraints in example 3.9).
Several examples of the use of this technique can be found in [17], including the original information causality scenario (which we discuss in example 3.9) and an entropic analogue of monogamy relations for Bell inequality violations [97,98].
This is related to example 3.8 above and reproduces an analysis from [17]. In this marginal scenario, we consider the Shannon cones for the three sets {X1,Y |S=1}, {X2,Y |S=2} and {Z} as well as the constraints I(X1:Y |S=1)≤H(Z) and I(X2:Y |S=2)≤H(Z) which are conjectured to hold [17]. (This conjecture is based on an argument in [99] that covers a special case; we are not aware of a general proof.) These conditions constrain a polyhedral cone of vectors (H(X1),H(X2),H(Z),H(Y |S=1), H(Y |S=2),H(X1Y |S=1),H(X2Y |S=2)), with 8 extremal rays that are all achievable using PR-boxes [65,66]. Importantly, the stronger constraint I(X1:Y |S=1)+I(X2:Y |S=2)≤H(Z), which holds in the quantum case (cf. example 3.8), does not hold here.Example 3.9 (Information causality scenario in general non-signalling theories).
4. Alternative techniques
Instead of relaxing the problem of characterizing the set of probability distributions compatible with a causal structure by considering entropy vectors, other computational techniques are currently being developed. In the following, we give a brief overview of these methods.
In this context, note also that there are methods that allow certification that the only restrictions implied by a causal structure are the conditional independence constraints among the observed variables [11], as well as procedures to show that the opposite is the case [100,101]. Such methods may (when applicable) indicate whether a causal structure should be analysed further (corresponding techniques are reviewed in [18]).
(a) Entropy vectors for other entropy measures
Entropy vectors may be computed in terms of other entropy measures, for instance in terms of the α-Rényi entropies [102]. For a quantum state ρX, the α-Rényi entropy is for , the cases are defined via the relevant limits (note that H1(X)=H(X)). Classical α-Rényi entropies are included in this definition when considering diagonal states.
One may expect that useful constraints on the compatible distributions can be derived from such entropy vectors. For 0<α<1 and α>1 such constraints were analysed in [103]. In the classical case, positivity and monotonicity are the only linear constraints on the corresponding entropy vectors for any α≠0,1. For multiparty quantum states monotonicity does not hold for any α, like in the case of the von Neumann entropy. For 0<α<1, there are no constraints on the allowed entropy vectors except for positivity, whereas for α>1 there are constraints, but these are nonlinear. The lack of further linear inequalities that generally hold limits the usefulness of entropy vectors using α-Rényi entropies for analysing causal structures. To our knowledge, it is not known how or whether nonlinear inequalities for Rényi entropies may be employed for this task. The case α=0, where has been considered separately in [72], where it was shown that further linear inequalities hold. However, only bi-partitions of the parties were considered and the generalization to full entropy vectors is still to be explored.
The above considerations do not mention conditional entropy and hence could be taken with the definition Hα(X|Y):=Hα(XY)−Hα(Y). Alternatively, one may consider a definition of the Rényi conditional entropy, for which Hα(X | Y Z)≤Hα(X | Y) [104–108]. With the latter definition, the conditional Rényi entropy cannot be expressed as a difference of unconditional entropies, and so to use entropy vectors, we would need to consider the conditional entropies as separate components. Along these lines, one may also think about combining Rényi entropies for different values of α and to use appropriate chain rules [109]. Because of the large increase in the number of variables compared to the number of constraints it is not clear that this will yield useful new conditions.
A second family of entropy measures, related to Rényi entropies, are the Tsallis entropies [110,111], which can be defined by HT,α(X):=(1/(1−α))(2(1−α)Hα(X)−1). Little work has been done on these in the context of causal structures, but some numerical work [112] suggests that they have advantages for detecting non-classicality in the post-selected Bell scenario (see also [113]).
(b) Non-entropic techniques
(i) Polynomial restrictions on compatible distributions
The probabilistic characterization of causal structures, depends (in general) on the dimensionality of the observed variables. Computational hardness results suggest that a full characterization is unlikely to be feasible, except in small cases [114,115]. Recent progress has been made with the development of procedures to construct polynomial Bell inequalities. A method that resorts to linear programming techniques [15] has led to the derivation of new inequalities for the bilocality scenario (as well as a related four-party scenario). Another iterative procedure allows for enlarging networks by adding a party to a network in a particular way. (Here, adding a party means adding one observed input and one observed output node as well as an unobserved parent for the output, the latter may causally influence one other output random variable in the network.) This allows for the constructions of nonlinear inequalities for the latter, enlarged network from inequalities that are valid for the former [14].
(ii) Inflations of causal structures
Furthermore, a recent approach relies on considering enlarged networks, so-called inflations, and inferring causal constraints from those [19,116]. Inflated networks may contain several copies of a variable that each have the same dependencies on ancestors (the latter may also exist in several instances) and which share the same distributions with their originals. Such inflations allow for the derivation of probabilistic inequalities that restrict the set of compatible distributions. These ideas also bear some resemblance to the procedures in [20], in the sense that they employ the idea that certain marginal distributions may be obtained from different networks; they are, however, much more focused on causal structures featuring interesting independence constraints. Inflations allowed the authors of [19] to refute certain distributions as incompatible with the triangle causal structure from figure 1c, in particular the so-called W-distribution, which could neither be proved to be incompatible entropically nor with the covariance matrix approach below.
(iii) Semidefinite tests relying on covariance matrices
One may look for mappings of the distribution of a set of observed variables that encode causal structure beyond considering entropies. For causal structures with two generations, i.e. one generation of unobserved variables as ancestors of one generation of observed nodes, a technique has been found using covariance matrices [20]. Each observed variable is mapped to a vector-valued random variable and the covariance matrix of the direct sum of these variables is considered. owing to the law of total expectation, this matrix allows for a certain decomposition depending on the causal structure. For a particular observed distribution and its covariance matrix, the existence of such a decomposition may be tested via semidefinite programming. The relation of this technique to the entropy vector method is not yet well understood. A partial analysis considering several examples is given in Section X of [20].
5. Open problems
The entropy vector approach has led to many certificates for the incompatibility of correlations with causal structures. However, we still lack a general understanding of how well entropic relations can approximate the set of achievable correlations. Firstly, the non-injective mapping from probabilities to entropies is not sufficiently understood and, secondly, the current methods employ further approximations, e.g. by restricting the number of non-Shannon inequalities that can be considered at a time. It is as yet unknown whether the entropy vector method (without post-selection) can ever distinguish correlations that arise from classical, quantum and more general non-signalling resources. Such insights may also inform the question of whether there exist novel inequalities for the von Neumann entropy of multiparty quantum states.
The post-selection technique allows for the derivation of additional constraints that may distinguish quantum from classically achievable correlations in the Bell scenario and possibly in other examples. However, the method relies on the causal structure featuring parentless observed nodes, hence it is not always applicable (see e.g. the triangle scenario). In such situations, one may try to combine the entropic techniques reviewed here with the inflation method [19], which might allow for further entropic analysis of several causal structures, e.g. of the triangle scenario.
Criteria to certify whether a set of entropic constraints is able to detect non-classical correlations are currently not available. For many of the established entropic constraints on classical causal structures it is unknown whether or not they are also valid for the corresponding quantum structure. In the case of the Bell scenario, this problem has been overcome. It has been shown that the known entropic constraints are even sufficient for detecting any non-classical correlations [7]. However, as the proof is specific to the scenario, finding a systematic tool to analyse the scope of the entropic techniques remains open.
Data accessibility
Additional accompanying data can be found in the electronic supplementary material.
Authors' contributions
Both authors contributed to the ideas present in the article. M.W. performed the computations and wrote the first draft. Both authors discussed and contributed to the final version of the manuscript.
Competing interests
We have no competing interests.
Funding
R.C. is supported by the EPSRC’s Quantum Communications Hub (grant number EP/M013472/1) and by an EPSRC First Grant (grant no. EP/P016588/1).
Acknowledgements
We thank Rafael Chaves and Costantino Budroni for confirming details of [17].