Universal recovery map for approximate Markov chains

A central question in quantum information theory is to determine how well lost information can be reconstructed. Crucially, the corresponding recovery operation should perform well without knowing the information to be reconstructed. In this work, we show that the quantum conditional mutual information measures the performance of such recovery operations. More precisely, we prove that the conditional mutual information I(A:C|B) of a tripartite quantum state ρABC can be bounded from below by its distance to the closest recovered state RB→BC(ρAB), where the C-part is reconstructed from the B-part only and the recovery map RB→BC merely depends on ρBC. One particular application of this result implies the equivalence between two different approaches to define topological order in quantum systems.


Introduction
A state ρ ABC on a tripartite quantum system A ⊗ B ⊗ C forms a (quantum) Markov chain if it can be recovered from its marginal ρ AB on A ⊗ B by a quantum operation R B→BC from B to B ⊗ C, i.e., ρ ABC = R B→BC (ρ AB ) . ( An equivalent characterization of ρ ABC being a Markov chain is that the conditional mutual information I(A : C|B) ρ := H(AB) ρ + H(BC) ρ − H(B) ρ − H(ABC) ρ is zero [20,23,24] where H(A) ρ := −tr(ρ A log 2 ρ A ) is the von Neumann entropy.The structure of these states has been studied in various works.In particular, it has been shown that A and C can be viewed as independent conditioned on B, for a meaningful notion of conditioning [15].Very recently it has been shown that Markov states can be alternatively characterized by having a generalized Rényi conditional mutual information that vanishes [9].A natural question that is relevant for applications is whether the above statements are robust (see [19] for an example and [10] for a discussion).Specifically, one would like to have a characterization of the states that have a small (but not necessarily vanishing) conditional mutual information, i.e., I(A : C|B) ≤ ε for ε > 0. First results revealed that such states can have a large distance to Markov chains that is independent of ε [8,18], which has been taken as an indication that their characterization may be difficult.However, it has subsequently been realized that a more appropriate measure instead of the distance to a (perfect) Markov chain is to consider how well (1) is satisfied [33,19,34,4].This motivated the definition of approximate Markov chains as states where (1) approximately holds.
In recent work [10], it has been shown that the set of approximate Markov chains indeed coincides with the set of states with small conditional mutual information.In particular, the distance between the two terms in (1), which may be measured in terms of their fidelity F , is bounded by the conditional mutual information. 1More precisely, for any state ρ ABC there exists a trace-preserving completely positive map R B→BC (the recovery map) such that Furthermore, it can be shown that a converse inequality holds that is of the form I(A : C|B)2 ρ ≤ −c 2 log 2 F (ρ ABC , R B→BC (ρ AB )), where c depends logarithmically on the dimension of A [4,10].
We also note that the fidelity term in (2), maximized over all recovery maps, i.e., is called fidelity of recovery, and has been introduced and studied in [26,5].With this quantity the main result of [10] can be written as The fidelity of recovery has several natural properties, e.g., it is monotonous under local operations on A and C, and it is multiplicative [5].The result of [10] has been extended in various ways.Based on quantum state redistribution protocols, it has been shown in [7] that (2) still holds if the fidelity term is replaced by the measured relative entropy 2 D M (•, •), which is generally larger, i.e., Furthermore, in [5] an alternative proof of (2) has been derived that uses properties of the fidelity of recovery (in particular, multiplicativity).Another recent work [3] showed how to generalize ideas from [10] to prove a remainder term for the monotonicity of the relative entropy in terms of a recovery map that satisfies (2).All known proofs of (2) are non-constructive, in the sense that the recovery map R B→BC is not given explicitly.It is merely known [10] that if A, B, and C are finite-dimensional then R B→BC can always be chosen such that it has the form on the support of ρ B , where U B and V BC are unitaries on B and B ⊗C, respectively.It would be natural to expect that the choice of the recovery map that satisfies (2) only depends on ρ BC , however this is only known in special cases.One such special case are Markov chains ρ ABC , i.e., states for which (1) holds perfectly.Here a map of the form (6) with V BC = id BC and U B = id B (sometimes referred to as transpose map or Petz recovery map) serves as a perfect recovery map [23,24].Another case where a recovery map that only depends on ρ BC is known explicitly are states with a classical B system, i.For general states, however, the previous results left open the possibility that the recovery map R B→BC depends on the full state ρ ABC rather than the marginal ρ BC only.In particular, the unitaries U B and V BC in (6), although acting only on B respectively B ⊗ C, could have such a dependence.
In this work we show that for any state ρ BC on B ⊗ C there exists a recovery map R B→BC that is universal -in the sense that the distance between any extension ρ ABC of ρ BC and R B→BC (ρ AB ) is bounded from above by the conditional mutual information I(A : C|B) ρ .In other words we show that (2) remains valid if the recovery map is chosen depending on ρ BC only, rather than on ρ ABC .

Main result
Theorem 2.1.For any density operator ρ BC on B⊗C there exists a trace-preserving completely positive map R B→BC such that for any extension ρ ABC on A ⊗ B ⊗ C where A, B, and C are separable Hilbert spaces.
Remark 2.2.If B and C are finite-dimensional Hilbert spaces, the statement of Theorem 2.1 can be tightened to Remark 2.3.The recovery map R B→BC predicted by Theorem 2.1 has the property that it maps ρ B to ρ BC .To see this, note that I(A : C|B) ρ = 0 for any density operator of the form ρABC = ρ A ⊗ ρ BC .Theorem 2.1 thus asserts that ρABC must be equal to R B→BC (ρ AB ), which implies that ρ BC = R B→BC (ρ B ).We note that so far it was unknown whether recovery maps that satisfy (2) and have this property do exist.
We note that Theorem 2.1 does not reveal any information about the structure of the recovery map that satisfies (7).However, if we consider a linearized version of the bound (7), we can make more specific statements.
Corollary 2.4.For any density operator ρ BC on B ⊗ C there exists a trace-preserving completely positive map R B→BC such that for any extension where A, B, and C are separable Hilbert spaces.Furthermore, if B and C are finite-dimensional then R B→BC has the form on the support of ρ B , where U BC→BC is a unital trace-preserving map from B ⊗ C to B ⊗ C.
Example 2.5.For density operators with a marginal on B ⊗ C of the form ρ BC = ρ B ⊗ ρ C , a universal recovery map that satisfies ( 8) is uniquely defined on the support of ρ B -it is the transpose map, which in this case simplifies to R B→BC : X B → X B ⊗ ρ C .It is straightforward to see that (8) holds.In fact, we even have equality if we consider the relative entropy (which is in general larger than the measured relative entropy), i.e., The uniqueness of R B→BC on the support of ρ B follows by using the fact that the universal recovery map should perfectly recover the Markov state ρ AB ⊗ ρ C where ρ AB is a purification of ρ B .This forces R B→BC to agree with the transpose map on the support of ρ B [23,24].
The proof of Theorem 2.1 is structured into two parts.We first prove the statement for finitedimensional Hilbert spaces B, and C in Section 3 and then show that this implies the statement for general separable Hilbert spaces in Section 4. The proof of Corollary 2.4 is given in Section 5.

Proof for finite dimensions
Throughout this section we assume that the Hilbert spaces B and C are finite-dimensional.In the proof Steps 1 -3 below, we also make the same assumption for A, but then drop it in Step 4. We start by explaining why ( 8) is a tightened version of (7) which was noticed in [7].Let D α (•||•) be the α-Quantum Rényi Divergence as defined in [21,31] with D 1 (ρ||σ) = D(ρ||σ) := tr(ρ(log ρ − log σ)).By definition of the measured relative entropy (see Definition B.1) we find for any two states ρ and σ where M := {M : M(ρ) = x tr(ρM x )|x x| with x M x = id} and {|x } is a family of orthonormal vectors.The inequality step uses that α → D α (ρ||σ) is a monotonically non-decreasing function in α [21, Theorem 7] and the final step follows from the fact that for any two states there exists an optimal measurement that does not increase their fidelity [12,Section 3.3].As a result, in order to prove Theorem 2.1 for finite-dimensional B and C it suffices to prove (8).
We first derive a proposition (Proposition 3.1 below) and then show how it can be used to prove (8) (and, hence, Theorem 2.1).The proposition refers to a family of functions  49)) such that ∆ R (ρ) ≥ 0 corresponds to (8).
The proposition asserts that if for any extension ρ ABC of ρ BC we have ∆ R (ρ) ≥ 0 for some R ∈ TPCP(B, B ⊗ C) and provided the function family ∆ R (•) satisfies certain properties described below, then there exists a single recovery map R for which ∆ R (ρ) ≥ 0 for all extensions ρ ABC of ρ BC on a fixed A system.We note that the precise form of the function family ∆ R (•) is irrelevant for Proposition 3.1 as long as it satisfies a list of properties as stated below.
As described above, our goal is to prove that there exists a recovery map R B→BC such that ∆ R (ρ) ≥ 0 for all ρ ABC ∈ D(A⊗B ⊗C) with a fixed marginal ρ BC on B ⊗C.To formulate our argument more concisely, we introduce some notation.For any set S of density operators The desired statement then reads as ∆ R (S) ≥ 0, for any set S of states on A ⊗ B ⊗ C with a fixed marginal ρ BC .Furthermore, for any fixed states ρ 0 ABC and ρ ABC on A ⊗ B ⊗ C and p ∈ [0, 1], we define where Â is an additional system with two orthogonal states |0 and |1 .More generally, for any fixed state ρ 0 ABC and for any set S of density operators ρ ABC we set Required properties of the ∆-function.
1.For any ρ 0 ABC , ρ ABC ∈ D(A ⊗ B ⊗ C) with identical marginals ρ 0 BC = ρ BC on B ⊗ C, for any R ∈ TPCP(B, B ⊗ C), and for any p Property 1 implies that for any state ρ 0 ABC , for any set S of operators ρ ABC with ρ BC = ρ 0 BC , and for any p ∈ [0, 1] we have Similarly, Property 2 implies Proposition 3.1.Let A, B, and C be finite-dimensional Hilbert spaces, P ⊆ TPCP(B, B ⊗ C) be compact and convex, S be a set of density operators on A ⊗ B ⊗ C with identical marginals on B ⊗ C, and ∆ R (•) be a family of functions of the form (13) that satisfies Properties 1-4.Then We now proceed in four steps.In the first, we prove Proposition 3.1 for finite sets S.This is done by induction over the cardinality of the set S. We show that if the statement of Proposition 3.1 is true for all sets S with |S| = n, this implies that it remains valid for all sets S with |S| = n + 1.In Step 2, we use an approximation step to extend this to infinite sets S which then completes the proof of Proposition 3.1.In the final two steps, we show how to conclude the statement of Theorem 2.1 for the finite-dimensional case from that.In Step 3 we prove (8) for the case where the recovery map that satisfies (8) could still depend on the dimension of the system A. In Step 4 we show how this dependency can be removed.
Step 1: Proof of Proposition 3.1 for finite size sets S We proceed by induction over the cardinality n := |S| of the set S of density operators.More precisely, the induction hypothesis is that for any finite-dimensional Hilbert space A and any set S of size n consisting of density operators on A ⊗ B ⊗ C with fixed marginal ρ BC on B ⊗ C, the statement (19) holds.For n = 1, this hypothesis holds trivially for R = R. 3We now prove the induction step.Suppose that the induction hypothesis holds for some n.Let A be a finite-dimensional Hilbert space and let S ∪ {ρ 0 ABC } be a set of cardinality n + 1 where S is a set of states on A ⊗ B ⊗ C with fixed marginal ρ BC on B ⊗ C of cardinality n and ρ 0 ABC is another state with ρ 0 BC = ρ BC .We need to prove that there exists a recovery map RB→BC ∈ P such that Let p ∈ [0, 1] and consider the set S p as defined in (16).In the following we view the states ρ p (see Equation (15)) in this set as tripartite states on ( Â ⊗ A) ⊗ B ⊗ C, i.e., we regard the system Â ⊗ A as one (larger) system.The induction hypothesis applied to the extension space Â ⊗ A and the set S p (of size n) of states on ( Â ⊗ A) ⊗ B ⊗ C implies the existence of a map R p B→BC ∈ P such that As by assumption the function D(A ⊗ B ⊗ C) ∋ ρ → ∆ R p (ρ) ∈ R ∪ {−∞} satisfies Property 1 (and hence also Equation ( 17)) we obtain This implies that Furthermore, for p = 0 the left inequality holds and for p = 1 the right inequality holds.By choosing Note also that R u B→BC , R v B→BC ∈ P, since by the induction hypothesis R p B→BC ∈ P for any p ∈ [0, 1].We will use this to prove that the recovery map R ∈ P defined by for an appropriately chosen α ∈ [0, 1], satisfies where c is a constant defined by Properties 3 and 4 together with Lemma C.1 and Remark C.3 ensure that the two maxima in (27) are attained which implies by the definition of the codomain of ∆ R (•) (see Equation ( 13)) that c is finite.In other words, for any δ > 0 there exists a recovery map Rδ ∈ P such that The compactness of P ensures that there exists a recovery map R ∈ P and a sequence {δ n } n∈N such that lim n→∞ δ n = 0 and lim Because of (28) we have which together with Property 4 implies that and thus proves (20).It thus remains to show (26).To simplify the notation let us define as well as It follows from ( 22) that Similarly, we have As by assumption the function ∆ R (•) satisfies Property 2 we find together with (35) that for any (If v = 1 it suffices to consider the case α = 1 so that the last term can be omitted; cf.Equation (40) below.)Analogously, using ( 18) and ( 34), we find (If u = 0 it suffices to consider the case α = 0; cf.Equation ( 43) below.) To conclude the proof of ( 26), it suffices to choose α ∈ [0, 1] such that the terms on the right hand side of ( 36) and (37) satisfy and Let us first assume that u ≥ 1 2 .Since Λ 0 and Λ 1 are non-negative (see Equation ( 24)), we may choose This immediately implies that the left hand side of (38) equals 0, so that the inequality holds.As Combining this with (40) we find which proves (39) because by (27) we have This immediately implies that the left hand side of (39) equals 0, so that the inequality holds.Furthermore, for δ > 0 sufficiently small such that v ≤ 1 2 , we obtain Together with (43) this implies which establishes (38).This concludes the proof of Proposition 3.1 for sets S of finite size.
Step 2: Extension to infinite sets S All that remains to be done to prove Proposition 3.1 is to generalize the statement to arbitrarily large sets S. In fact, we show that there exists a recovery map R B→BC ∈ P such that ∆ R (S) ≥ 0, where S is the set of all density operators on A ⊗ B ⊗ C for a fixed finite-dimensional Hilbert space A and a fixed marginal ρ BC .Note first that this set S of all density operators on A ⊗ B ⊗ C with fixed marginal ρ BC on B ⊗ C is compact (see Lemma C.2).This implies that for any ε > 0 there exists a finite set S ε of density operators on A ⊗ B ⊗ C such that any ρ ∈ S is ε-close to an element of S ε .We further assume without loss of generality that ( Combining this with Property 4 gives for all where the third inequality holds since S εn ⊂ S εm for ε n ≥ ε m , respectively n ≤ m.The final inequality follows from the defining property of R ε .For any fixed ρ ∈ S and for all n ∈ N, let ρ n ∈ S εn be such that lim n→∞ ρ n = ρ ∈ S. (By definition of S εn it follows that such a sequence {ρ n } n∈N with ρ n ∈ S εn always exists.)Property 3 together with (47) yields Since (48) holds for any ρ ∈ S, we obtain ∆ R(S) ≥ 0, which completes the proof of Proposition 3.1.
Step 3: From Proposition 3.1 to Theorem 2.1 for fixed system A We next show that Theorem 2.1, for the case where A is a fixed finite-dimensional system, follows from Proposition 3.1.For this we use Proposition 3.1 for the function family with To apply Proposition 3.1, we have to verify that the function family Proof.We first verify that the function ∆ R (•) satisfies Property 1.For any state ρ p of the form (15), we have by the definition of the mutual information Because ρ 0 BC = ρ BC , the first term, H(C|B) ρ p , is independent of p, i.e., H(C|B) ρ p = H(C|B) ρ 0 = H(C|B) ρ .The second term can be written as an expectation over Â, i.e., As a result we find The density operator R B→BC (ρ p ÂAB ) can be written as We can thus apply Lemma B.3, from which we obtain Equations ( 52) and (54) imply that which concludes the proof of Property 1.That ∆ R (•) satisfies Property 2 can be seen as follows We next verify that the function ∆ R (•) satisfies Property 3. The Alicki-Fannes inequality ensures that D By definition of the measured relative entropy (see Definition B.1), we find for In the penultimate step, we use that the relative entropy is lower semicontinuous [17,Exercise 7.22] and that M as well as R B→BC are linear and bounded operators and hence continuous.We finally show that ∆ R (•) fulfills Property 4. It suffices to verify that TPCP(B, ).Note that since R and M are linear bounded operators and hence continuous and the relative entropy for two states σ 1 and σ 2 is defined by Since the supremum of continuous functions is lower semicontinuous [6, Chapter IV, Section 6.2, Theorem 4], the assertion follows.
What remains to be shown in order to apply Proposition 3.1 is that for any ρ ∈ S where S is the set of states on A ⊗ B ⊗ C with a fixed marginal ρ BC on B ⊗ C, there exists a recovery map R B→BC ∈ P such that ∆ R (ρ) ≥ 0. By choosing P = TPCP(B, B ⊗ C), the main result of [7] however precisely proves this.We have thus shown that ∆ R (ρ) ≥ 0 holds for a universal recovery map R B→BC ∈ P, so that (8) follows for any fixed dimension of the A system.This proves the statement of Remark 2.2 (and, hence, Theorem 2.1) for the case where A is a fixed finite-dimensional Hilbert space.
Step 4: Independence from the A system Let S be the set of all density operators on Ā ⊗ B ⊗ C with a fixed marginal ρ BC on B ⊗ C, where B and C are finite-dimensional Hilbert spaces and Ā is the infinite-dimensional Hilbert space ℓ 2 of square summable sequences.We now show that there exists a recovery map R B→BC such that ∆ R (S) ≥ 0.
Let {Π a Ā} a∈N be a sequence of finite-rank projectors on Ā that converges to id Ā with respect to the weak operator topology.Let S a denote the set of states whose marginal on Ā is contained in the support of Π a Ā and with the same fixed marginal ρ BC on B ⊗ C as the elements of S. For all a ∈ N, let R a B→BC denote a recovery map that satisfies ∆ R a (S a ) ≥ 0. Note that the existence of such maps is already established by the proof of Theorem 2.1 for the finite-dimensional case.As the set of tracepreserving completely positive maps on finite-dimensional systems is compact (see Remark C.3) there exists a subsequence {a i } i∈N such that lim i→∞ a i = ∞ and lim i→∞ R ai = R ∈ TPCP(B, B ⊗ C).For every ρ ∈ S there exists a sequence of states {ρ a } a∈N with ρ a ∈ S a that converges to ρ in the trace norm (see Lemma E.3).Lemma 3.2 (in particular Properties 3 and 4), yields for any The fourth inequality follows since a i ≥ a for large enough i and since this implies that S ai ⊃ S a , and the final inequality follows by definition of R ai .This shows that ∆ R(S) ≥ 0.
To retrieve the statement of Remark 2.2 (and hence Theorem 2.1 for finite-dimensional B and C), we need to argue that this same map R remains valid when we consider any separable space A. In order to do this, observe that any separable Hilbert space A can be isometrically embedded into Ā [25,Theorem II.7].To conclude, it suffices to remark that ∆ R is invariant under isometries applied on the space A.

Extension to infinite dimensions
In this section we show how to obtain the statement of Theorem 2.1 for separable (not necessarily finite-dimensional) Hilbert spaces A, B, C from the finite-dimensional case that has been proven in Section 3.For trace non-increasing completely positive maps R B→BC we define the function family where D(A ⊗ B ⊗ C) denotes the set of states on A ⊗ B ⊗ C. We will use the same notation as introduced at the beginning of Section 3. In addition, we take S to be the set of all states on A ⊗ B ⊗ C with a fixed marginal ρ BC on B ⊗ C. The proof proceeds in two steps where we first show that there exists a sequence of recovery maps {R k B→BC } k∈N such that lim k→∞ ∆R k (S) ≥ 0, where the property that all elements of S have the same marginal on the B ⊗ C system will be important.In the second step we conclude by an approximation argument that there exists a recovery map R B→BC such that ∆R (S) ≥ 0.
Step 1: Existence of a sequence of recovery maps We start by introducing some notation that is used within this step.Let {Π b B } b∈N and {Π c C } c∈N be sequences of finite-rank projectors on B and C which converge to id B and id C with respect to the weak operator topology.For any given ρ ABC ∈ D(A ⊗ B ⊗ C) consider the normalized projected states and where for any c ∈ N, the sequence {ρ b,c ABC } b∈N converges to ρ c ABC in the trace norm (see, e.g., Corollary 2 of [13]) and the sequence {ρ c ABC } c∈N converges to ρ ABC also in the trace norm.Let S b,c be the set of states that is generated by (61) for all ρ ABC ∈ S. We note that for any given b, c all elements of S b,c have an identical marginal on B→BC denote a recovery map that satisfies ∆R b,c (S b,c ) ≥ 0 whose existence is established in the proof of Theorem 2.1 for finite-dimensional systems B and C (see Section 3).We next state a lemma that explains how ∆R (ρ) changes when we replace ρ by a projected state ρ b,c .
The Alicki-Fannes inequality [1] ensures that for a fixed finite-dimensional system C the conditional mutual information I(A : C|B) ρ = H(C|B) ρ − H(C|AB) ρ is continuous in ρ with respect to the trace norm, i.e., where ε b,c = ρ b,c ABC − ρ c ABC 1 and h(•) denotes the binary Shannon entropy function defined by h(p Using the Fuchs-van de Graaf inequality [11] and Lemma E.1, we find Combining ( 64) and (65) yields According to (67) and since 2 −x ≥ 1 − ln(2)x for x ∈ R, we have Combining ( 68) and (69) yields For two states σ 1 and σ 2 let P (σ 1 , σ 2 ) := 1 − F (σ 1 , σ 2 ) 2 denote the purified distance.Applying the Fuchs-van de Graaf inequality [11] and Lemma E.1 gives Since the purified distance is a metric [28] that is monotonous under trace-preserving completely positive maps [27, Theorem 3.4], (71) gives As the fidelity for states lies between zero and one, (72) implies This implies that By definition of the quantity ∆R (•) (see Equation ( 60)) the combination of ( 70) and (74) yields where ε b,c is bounded by (66).By Lemma E. (77) Step 2: Existence of a limit Recall that S is the set of density operators on A ⊗ B ⊗ C with a fixed marginal ρ BC on B ⊗ C. The goal of this step is to use (77) to prove that there exists a recovery map R B→BC such that ∆R (S) ≥ 0 .
Let {Π m B } m∈N and {Π m C } m∈N be sequences of projectors with rank m that weakly converge to id B and id C , respectively.Furthermore, for any m and any R ∈ TPCP(B, B ⊗ C) let [R] m be the trace non-increasing map obtained from R by projecting the input and output with Π m B and Π m B ⊗ Π m C , respectively.We start with a preparatory lemma that proves a relation between ∆[R] m (S) and ∆R (S).
Proof.For any ρ ABC ∈ S and any m ∈ N let us define the non-negative operator ρm . By definition of ∆R (•) (see Equation ( 60)), it suffices to show that for any ρ ABC ∈ S, any . As in Step 1 let P (•, •) denote the purified distance.Lemma E.1 implies that Similarly, we obtain (82) By Hölder's inequality, monotonicity of the trace norm for trace-preserving completely positive maps [30, Example 9.1.8and Corollary 9.1.10]and (81) together with the Fuchs-van de Graaf inequality [11] and Lemma E.
Combining ( 82), (83) and Hölder's inequality together with the assumption R(ρ Inequalities ( 81), (84) and the monotonicity of the purified distance under trace-preserving and completely positive maps [27,Theorem 3.4] show that for As the purified distance between two states lies inside the interval [0, 1] and since (δ As a result, we find , which implies lim m→∞ δ m = 0.The following lemma proves that for sufficiently large m and a recovery map R B→BC that maps ρ B to density operators that are close to ρ BC , the operator [R] m (ρ AB ) has a trace that is bounded from below by essentially one.
Proof.We first note that by Hölder's inequality and monotonicity of the trace norm for trace-preserving completely positive maps [30, Example 9.1.8and Corollary 9.1.10]we have that this does not uniquely define the recovery map R B→BC , which is not a problem as Theorem 2.1 proves the existence of a recovery map that satisfies (7) and does not claim that this map is unique.It remains to show that R B→BC has the property (78).This follows from the observation that any density operator ρ AB can be obtained from the purification ρ B: B by applying a trace-preserving completely positive map T B→A from B to A. By Lemma C.5 and because T B→A commutes with any recovery map R B→BC from B to B ⊗ C, we have Using the continuity of the fidelity (see, e.g., Lemma B.9 in [10]), this implies that for any ρ ∈ S. Combining this with (99) gives which concludes Step 2 and thus completes the proof of Theorem 2.1 in the general case where B and C are no longer finite-dimensional.

Proof of Corollary 2.4
The first statement of Corollary 2.4 that holds for separable Hilbert spaces follows immediately from Theorem 2.1, since 2 − 1 2 I(A:C|B)ρ ≥ 1 − ln(2) 2 I(A : C|B) ρ .The proof of the second statement of Corollary 2.4 is partitioned into three steps. 8We first show that a similar method as used in Section 3 can be used to reveal certain insights about the structure of the recovery map R B→BC (which is not universal) that satisfies In a second step, by invoking Proposition 3.1, we use this knowledge to prove that for a fixed A system there exists a recovery map that satisfies (110) which is universal and preserves the structure of the non-universal recovery map from before.Finally, in Step 3 we show how the dependency on the fixed A system can be removed.
Step 1: Structure of a non-universal recovery map We will show that for any density operator ρ ABC on A⊗B ⊗C, where A, B, and C are finite-dimensional Hilbert spaces there exists a trace-preserving completely positive map R B→BC that satisfies (110) and is of the form on the support of ρ B , where W BC is a unitary on B ⊗ C. We start by proving the following preparatory lemma.
since C is assumed to be a finite-dimensional system and as such I(A : C|B) ρ < ∞.Hence for p < γ, inequality (118) is violated.Since by [10] for any p ∈ (0, 1] there exists a recovery map R B→BC of the form (112) that satisfies (117) we conclude that for sufficiently small p there exists a recovery map R B→BC of the form (112) that satisfies (117) and leaves ρ 0 ABC invariant.However for recovery maps that leave ρ 0 ABC invariant, (117) simplifies to (113) for all p.Thus, there exists a recovery map R B→BC of the form (112) satisfying (113) that leaves ρ 0 ABC invariant, i.e., R B→BC (ρ 0 AB ) = ρ 0 ABC .Since ρ 0 ABC := ρ A ⊗ ρ BC is a Markov chain with marginal ρ 0 BC = ρ BC , the condition R B→BC (ρ 0 AB ) = ρ 0 ABC implies that R B→BC (ρ B ) = ρ BC which proves the assertion.
Lemma 5.1 implies that there exists a recovery map R B→BC that satisfies (113) and fulfills Using the fact that R B→BC is trace preserving and the invariance of the trace under unitaries we find This implies that tr ρ B (id This simplifies (120) to V BC ρ BC V † BC = ρ BC , i.e., V BC and ρ BC commute, which implies that the mapping (112) can be written as with Step 2: Structure of a universal recovery map for fixed A system In this step we show that the recovery map satisfying (110) of the form (111), whose existence has been established in Step 1, can be made universal without sacrificing the (partial) knowledge about its structure.The idea is to apply Proposition 3.1 for the function family We therefore need to verify that the assumptions of Proposition 3.1 are fulfilled.This is done by the following lemma.We first note that since C is finite-dimensional this implies that ∆R (ρ) < ∞ for all ρ ∈ D(A ⊗ B ⊗ C).Proof.We start by showing that ∆R (•) satisfies Property 1.For ρ p ÂABC as defined in (15), we have The density operator R B→BC (ρ p ÂAB ) can be written as The relevant density operators thus satisfy the orthogonality conditions for equality in Lemma A.1, from which (124) follows.Furthermore, as explained in the proof of Lemma 3.2 we have Equations ( 124) and (126) imply that We next verify that ∆R (•) fulfills Property and, hence by the definition of ∆R (•) The function ρ → ∆R (ρ) is continuous which clearly implies Property 3. To see this, recall that by the Alicki-Fannes inequality ρ → I(A : C|B) ρ is continuous for a finite-dimensional C system [1].Furthermore, since ρ AB → R BC (ρ AB ) is continuous (see Lemma C.5), Lemma B.9 of [10] implies that ρ ABC → F (ρ ABC , R B→BC (ρ AB )) is continuous, which then establishes Property 3.
Finally it remains to show that ∆R (•) satisfies Property 4, which however follows directly by Lemma C.4.
Let P ⊆ TPCP(B, B ⊗ C) be the convex hull of the set of trace-preserving completely positive mappings from the B to the B ⊗ C system that are of the form (111).We note that the elements of P are mappings of the form (10), since a convex combination of unitary mappings are unital and a convex combination of trace-preserving maps remains trace-preserving.Proposition 3.1, which is applicable as shown in Lemma 5.2 together with Step 1 therefore proves the assertion for a fixed A system.
Step 3: Independence from the A system Let S be the set of all density operators on Ā ⊗ B ⊗ C with a fixed marginal ρ BC on B ⊗ C, where B and C are finite-dimensional Hilbert spaces and Ā is the infinite-dimensional Hilbert space ℓ 2 of square summable sequences.
We note that the set of trace-preserving completely positive maps of the form (10) on finitedimensional systems is compact, which follows by Remark C.3 together with the fact that the intersection of a compact set and a closed set is compact.Hence, using Lemma 5.2 (in particular Properties 3 and 4) and the result from Step 2 above, the same argument as in Step 4 of Section 3 can be applied to conclude the existence of a recovery map R B→BC of the form (10) such that ∆R (S) ≥ 0.
As every separable Hilbert space A can isometrically embedded into Ā [25, Theorem II.7] and since ∆ R is invariant under isometries applied on the extension space A, we can conclude that the recovery map R B→BC remains valid for any separable extension space A. This proves the statement of Corollary 2.4 for finite-dimensional B and C systems.

Discussion
Our main result is that for any density operator ρ BC on B ⊗ C there exists a recovery map R B→BC such that the distance between any extension ρ ABC of ρ BC acting on A ⊗ B ⊗ C and R B→BC (ρ AB ) is bounded from above by the conditional mutual information I(A : C|B) ρ .It is natural to ask whether such a map can be described as a simple and explicit function of ρ BC .In fact, it was conjectured in [19,4] that (2) holds for a very simple choice of map, namely called the transpose map or Petz recovery map.This conjecture, if correct, would have important consequences in obtaining remainder terms for the monotonicity of the relative entropy [3].As discussed in the introduction, if ρ ABC is such that it is a (perfect) quantum Markov chain or the B system is classical, the claim of the conjecture is known to hold.One possible approach to prove a result of this form would be to start from the result (2) for an unknown recovery map and then show that the transpose map T B→BC cannot be much worse than any other recovery map.In fact, a theorem of Barnum and Knill [2] directly implies that when ρ ABC is pure, we have This shows that, if ρ ABC is pure, an inequality of the form (2), with the fidelity replaced by its square root, holds for the transpose map.In order to generalize this to all states, one might hope that (131) also holds for mixed states ρ ABC .However, this turns out to be wrong even when the state ρ ABC is completely classical (see Appendix F for an example).
Another interesting question is whether the lower bound in terms of the measured relative entropy (8) can be improved to a relative entropy.Such an inequality is known to be false if we restrict the recovery map to be the transpose map (130) [33], but it might be true when we optimize over all recovery maps.It is worth noting that in case such an inequality holds for any ρ ABC and a corresponding recovery map, then the argument presented in this work would imply that there exists a universal recovery map satisfying (8) with the relative entropy instead of the measured relative entropy.This can be seen by defining the function family ρ → ∆ R (ρ) := I(A : C|B) ρ − D(ρ ABC ||R B→BC (ρ AB )).Lemma B.2, the convexity of the relative entropy [22,Theorem 11.12] and the lower semicontinuity of the relative entropy [17,Example 7.22] imply that ∆ R (•) satisfies Properties 1-4.As a result, Proposition 3.1 is applicable which can be used to prove the existence of a universal recovery map.

Appendices A General facts about the fidelity
The following lemma states a standard concavity property of the fidelity which is presented here for completeness and since we are interested in the case where equality holds.
Lemma A.1.For any density operators ρ, ρ ′ , σ, and σ ′ , and for any p ∈ [0, 1] we have with equality if both of ρ and σ are orthogonal to both of ρ ′ and σ ′ .
The following lemma generalizes the Fuchs-van de Graaf inequality which has been proven for states to non-negative operators.The result is standard and stated here for completeness.
Lemma A.2.For any two non-negative operators ρ and σ with tr(ρ) ≥ tr(σ), the trace norm of their difference is bounded from above by Proof.Let ω be a non-negative operator with tr(ω) = tr(ρ) − tr(σ), whose support is orthogonal to the support of both ρ and σ, and define σ ′ = σ + ω.Then tr(ρ) = tr(σ ′ ) and It therefore suffices to show that the claim holds for operators with tr(ρ) = tr(σ) = c ∈ R + .Furthermore for c > 0, defining ρ = ρ/c and σ = σ/c and noting that it suffices to verify that the claim holds for tr(ρ) = tr(σ) = 1 which follows by the Fuchs-van de Graaf inequality [11].
B General facts about the measured relative entropy where {|x } is a finite set of orthonormal vectors.
Lemma B.2.Let ρ, ρ ′ , σ, and σ ′ be density operators such that both ρ and σ are orthogonal to both ρ ′ and σ ′ .For any p ∈ [0, 1] we have Proof.By the orthogonality of ρ and ρ ′ (respectively σ and σ ′ ) we have and ρ log ρ ′ = 0. Thus by definition of the relative entropy we obtain the desired statement.
Lemma B.3.Let ρ, ρ ′ , σ, and σ ′ be density operators such that both ρ and σ are orthogonal to both ρ ′ and σ ′ .For any p ∈ [0, 1] we have Proof.Let M = {M x }, M ′ = {M ′ y } be measurements and define the POVM on N whose elements are given by {M x } x ∪ {M ′ y } y .Then we can write As a result using Lemma B.2, As this inequality is valid for any measurements M and M ′ , taking the supremum over such measurements gives For the other direction, consider a measurement M = {M x }.We can write Combining this with the joint convexity of the relative entropy [22,Theorem 11.12], we get Lemma B.4.For density operators ρ, σ, and σ ′ and p ∈ [0, 1] the measured relative entropy satisfies Proof.For any measurement M, where the first inequality step uses the convexity of the relative entropy [22,Theorem 11.12].Taking the supremum over M, we get the desired result.

C Basic topological facts
For completeness we state here some standard topological facts about density operators and tracepreserving completely positive maps.
Lemma C.
Remark C.3.Let E and G be two finite-dimensional Hilbert spaces.The space of trace-non-increasing (respectively trace-preserving) completely positive maps from E to G is compact.To see this note that Lemma C.2 implies that the set F := {X ∈ Pos(E ⊗ G) : tr G (X) ≤ id E } is compact.By the Choi-Jamiolkowski representation F is however isomorphic to the set of all trace-non-increasing completely positive maps from E to G. The same argumentation applied to the set F := {X ∈ Pos(E ⊗ G) : tr G (X) = id E } shows that the set of trace-preserving completely positive maps from E to G is compact.
Lemma C.4.Let G and K be finite-dimensional Hilbert spaces and let Proof.This follows directly from the continuity of R → R G→GK (σ EG ) and the continuity of the fidelity (see, e.g., Lemma B.9 of [10]).
Lemma C.5.Let E, G, and K be separable Hilbert spaces and R ∈ TPCP(G, K).Then the mapping Proof.As the map is linear it suffices to show that it is bounded.For that we can decompose X = P −N with P and N orthogonal non-negative operators.Then we have Proof.Let |ψ be a purification of σ EG then by Uhlmann's theorem [29] we find and We next prove a basic statement about converging projectors that is used several times in the proof of Theorem 2.1.
Lemma E.2.Let E be a separable Hilbert space and let {Π e E } e∈E be a sequence of finite-rank projectors on E which converges to id E with respect to the weak operator topology.Then for any density operator σ E on E we have lim e→∞ tr(Π e E σ E ) = tr(σ E ).Proof.By assumption the Hilbert space E is separable which implies that any state σ E can be written as σ E = i p i |x i x i |, where p i ≥ 0, i p i = 1 and {|x i } i is an orthonormal basis on E.

F The transpose map is not square-root optimal
As discussed in Section 6, for pure states ρ ABC it is known [2] that holds for T B→BC the transpose map.In this appendix we show that (164) does not hold for all mixed states.Let dim A = dim B = dim C = 2 and consider the state The transpose map satisfies which shows that (164) cannot hold since F ρ ABC , R B→BC (ρ AB ) ≤ F (A; C|B) ρ .This does not show that one cannot prove a non-trivial guarantee on the performance of the transpose map relative to the optimal recovery map, but it suggests that such a guarantee would have to be worse than the square root (and actually worse that the fourth root as well using another example), or perhaps it is more naturally expressed using a different distance measure (using similar examples, the trace distance does not seem to be a good candidate, either).We further note that this example does not show that (2) is wrong for the transpose map.
e., qcq-states of the form ρ ABC = b P B (b)|b b| ⊗ ρ AC,b , where P B is a probability distribution, {|b } b an orthonormal basis on B and {ρ AC,b } b a family of states on A⊗C.As discussed in [10], for such states (2) holds for the recovery map defined by R B→BC (|b b|) = |b b| ⊗ ρ C,b for all b, where ρ C,b = tr A (ρ AC,b ).
parameterized by recovery maps R ∈ TPCP(B, B ⊗ C), where TPCP(B, B ⊗ C) denotes the set of trace-preserving completely positive maps from B to B ⊗ C and D(A ⊗ B ⊗ C) denotes the set of density operators on A ⊗ B ⊗ C. Subsequently in the proof, the function family ∆ R (•) will be constructed as the difference of the two terms in (8) (see Equation ( whose existence follows from the validity of Proposition 3.1 for sets of finite size (which we proved in Step 1).Since the set TPCP(B, B ⊗ C) is compact (see Remark C.3) there exists a decreasing sequence {ε n } n∈N and R ∈ TPCP(B, B ⊗ C) such that lim n→∞ ε n = 0 and R = lim n→∞ R εn .

Lemma 3 . 2 .
of the form (49) satisfies the assumptions of the proposition.This is ensured by the following lemma.Let A be a separable and B and C finite-dimensional Hilbert spaces.The function family ∆ R (•) defined by (49) satisfies Properties 1-4.

Lemma 4 . 1 .
For any ρ BC ∈ D(B ⊗ C) there exists a sequence of reals {ξ b,c } b,c∈N that satisfies 4 lim c→∞ lim b→∞ ξ b,c = 0, such that for any R ∈ TPCP(B, B ⊗ C), any extension ρ ABC of ρ BC , and ρ b,c ABC as given in (61) we have∆R (ρ b,c ) − ∆R (ρ) ≤ ξ b,c for all b, c ∈ N .(63)Proof.We note that local projections applied to the subsystem C can only decrease the mutual information, i.e., tr(Π c C ρ C )I(A : C|B) ρ c ≤ I(A : C|B) ρ .

2 ,
we find lim b→∞ tr(Π b B ⊗ Π c C ρ BC ) = tr(Π c C ρ C ) for all c ∈ N and hence lim b→∞ ε b,c = 0 for any c ∈ N. Furthermore, we have lim c→∞ tr(Π c C ρ C ) = 1 and lim c→∞ lim b→∞ tr(Π b B ⊗ Π c C ρ BC ) = 1 which implies that lim c→∞ lim b→∞ ξ b,c = 0.This proves the assertion.By Lemma 4.1, using the notation defined at the beginning of Step 1, we find lim sup c→∞ lim sup b→∞ ∆R b,c (S) = lim sup c→∞ lim sup b→∞ inf ρ∈S ∆R b,c (ρ) ≥ lim sup c→∞ lim sup b→∞ inf ρ∈S ∆R b,c (ρ b,c ) − ξ b,c = lim sup c→∞ lim sup b→∞ inf ρ b,c ∈S b,c ∆R b,c (ρ b,c ) − ξ b,c = lim sup c→∞ lim sup b→∞ ∆R b,c (S b,c ) ≥ 0 , (76) where the second equality step is valid since all states in S have the same fixed marginal on B ⊗ C and since the sequence {ξ b,c } b,c∈N only depends on this marginal.The penultimate step uses that lim c→∞ lim b→∞ ξ b,c = 0.The final inequality follows by definition of R b,c B→BC .Inequality (76) implies that there exist sequences {b k } k∈N and {c k } k∈N such that lim sup k→∞ ∆ R b k ,c k (S) ≥ 0. Setting R k B→BC = R b k ,c k B→BC then implies that there exists a sequence {R k B→BC } k∈N of recovery maps that satisfies lim sup k→∞ ∆R k (S) ≥ 0 .

Lemma 4 . 3 .
Let A, B, and C be separable Hilbert spaces.For any density operator ρ AB ∈ D(A ⊗ B) and any R ∈ TPCP(B, B ⊗ C) we have

Lemma 5 . 2 .
Let A be a separable and B and C finite-dimensional Hilbert spaces.The function family ∆R (•) defined by (123) satisfies Properties 1-4.
Recall that B and C are separable Hilbert spaces and that {Π m B } m∈N and {Π m B ⊗ Π m C } m∈N converge weakly to id B and id B ⊗ id C respectively.Lemma E.2 thus shows that lim m→∞ tr(ρ B Π m B ) = 1 and lim m→∞ tr(ρ BC 1. Let α ∈ R + .The space of non-negative operators on a finite-dimensional Hilbert space E with trace smaller or equal to α (respectively equal to α) is compact.
Proof.Let D ′ (E) := {ρ ∈ Pos(E) : tr(ρ) ≤ α} denote the set non-negative operators on E with trace not larger than one, where Pos(E) is the set of non-negative operators on E. Consider the ball B := {e ∈ E : e ≤ α} which is compact.The function B ∋ e → f (e) = ee † ∈ D ′ (E) is continuous and thus the set f (B) = {ee † : e ∈ E, e ≤ α} is compact, as continuous functions map compact sets to compact sets.By the spectral theorem it follows that D ′ (E) = convf (B).As the convex hull of every compact set is compact this proves the assertion.The same argumentation (by replacing the inequalities with equalities) proves that the set of non-negative operators on E with trace α is compact.Lemma C.2.Let E, G be finite-dimensional Hilbert spaces and let σ G ∈ Pos(G).The space of nonnegative operators on E ⊗ G with a marginal on G smaller or equal to σ G (respectively equal to σ G ) is compact.Proof.Let σ G ∈ Pos(G).By Lemma C.1, the set of non-negative operators on E ⊗ G with trace not larger than α ∈ R + is compact.The set {X ∈ E ⊗ G : tr E (X) ≤ ρ G } is closed.The intersection of a compact set and a closed set is compact which implies that {X ∈ Pos(E ⊗ G) : tr E [11]he sequence {Π e E } e∈N weakly converges to id E , we find lim where the second step uses the dominated convergence theorem that is applicable since| x i |Π e E |x i | ≤ | x i |id E |x i | for all e ∈ N.Let E and G be separable Hilbert spaces and let S denote the set of bipartite density operators on E ⊗ G with a fixed marginal σ G on G. Let {Π e E } e∈N be a sequence of projectors with rank e that weakly converge to id E and S e be the set of bipartite states on E ⊗ G whose marginal on E is contained in the support of Π e E and whose marginal on G is identical to σ G .Lemma E.3.For every σ EG ∈ S there exists a sequence {σ e EG } e∈N with σ e EG ∈ S e that converges to σ EG with respect to the trace norm.Proof.For σ EG ∈ S, letσe EG := (Π e E ⊗ id G )σ EG (Π e E ⊗ id G ) tr (Π e E ⊗ id G )σ EG ,(159)which has the desired support on E, however, σe G = σ G in general.This is fixed by consideringσ e EG := tr (Π e E ⊗ id G )σ EG σe EG + |0 0| E ⊗ tr E (Π e⊥ E ⊗ id G )σ EG (Π e⊥ E ⊗ id G ) G ,(160)where |0 E is a normalized state on E. Since the partial trace on E is cyclic on E we obtainσ e G = tr E (σ e EG ) = tr E (Π e E ⊗ id G )σ EG (Π e E ⊗ id G ) + tr E (Π e⊥ E ⊗ id G )σ EG (Π e⊥ E ⊗ id G ) = tr E (Π e E ⊗ id G )σ EG + tr E (Π e⊥ E ⊗ id G )σ EG = tr E (σ EG ) = σ G .(161)By the multiplicativity of the trace norm under tensor products and sinceA 1 = tr( √ A † A), the triangle inequality implies that σe EG − σ e EG 1 ≤ 1 − tr (Π e E ⊗ id G )σ EG + tr E (Π e⊥ E ⊗ id G )σ EG (Π e⊥ E ⊗ id G ) 1 = 1 − tr (Π e E ⊗ id G )σ EG + tr (Π e⊥ E ⊗ id G )σ EG = 2 1 − tr(Π e E σ E ) .(162)LemmaE.2 now implies that lim e→∞ tr(Π e E σ E ) = 1.We note that the sequence {σ e EG } e∈N converges to σ EG in the trace norm since by the Fuchs-van de Graaf inequality[11], Lemma E.1 and Lemma E.2Combining this with (162) and the triangle inequality proves that {σ e EG } e∈N converges to σ EG in the trace norm.