You have accessResearch article

# Interferometric visibility and coherence

## Abstract

Recently, the basic concept of quantum coherence (or superposition) has gained a lot of renewed attention, after Baumgratz et al. (Phys. Rev. Lett.113, 140401. (doi:10.1103/PhysRevLett.113.140401)), following Åberg (http://arxiv.org/abs/quant-ph/0612146), have proposed a resource theoretic approach to quantify it. This has resulted in a large number of papers and preprints exploring various coherence monotones, and debating possible forms for the resource theory. Here, we take the view that the operational foundation of coherence in a state, be it quantum or otherwise wave mechanical, lies in the observation of interference effects. Our approach here is to consider an idealized multi-path interferometer, with a suitable detector, in such a way that the visibility of the interference pattern provides a quantitative expression of the amount of coherence in a given probe state. We present a general framework of deriving coherence measures from visibility, and demonstrate it by analysing several concrete visibility parameters, recovering some known coherence measures and obtaining some new ones.

### 1. Introduction

The physics of constructive and destructive interference of waves, along with the concept of coherence, has been well understood since the nineteenth century. With the advent of quantum mechanics, these studies have assumed a fundamental quality as in quantum theory the superposition principle applies to everything, and the presence of quantum coherence is the basic hallmark of departure from classical physics. Recently, Baumgratz et al. [1], following Åberg’s earlier work [2], have launched a flurry of new activity on coherence by attempting to cast it as a resource theory and introducing a number of tasks and monotones [3,4].

It has, however, remained largely unclear what this resource of ‘coherence’ is about and how it relates to theories of asymmetry, among others [4]. To make contact with the operational foundations of coherence, we go back to its very definition, the observability of an interference pattern in a suitable experiment. Our present approach is to consider an idealized multi-path interferometer, which receives the state ρ under consideration at the input. The experimenter is at liberty to put phase plates into each of the paths, and to construct a detector (a general beam splitter with detection of the output beams). The interference pattern, i.e. the response of the fixed detector as a function of the multiple phases, is the signature of coherence: the more it fluctuates, intuitively, the more coherent is the state. The degree of fluctuation, aka visibility, quantifies the strength of interference.

The idea in this paper is that coherence is the potential of a state to yield visible fringes in a suitable experiment. Hence, we propose to optimize the visibility over all possible detectors, to obtain a measure of coherence of the original state. Indeed, we prove that, under mild assumptions, every visibility parameter yields a coherence measure in this way, strongly monotonic under a certain class of incoherence-preserving operations. We illustrate our theory with concrete examples of visibility parameters.

### 2. Interferometers and visibility

Consider a multi-path interferometer, in which a single particle can be in one of d paths, denoting the spatial variable by orthogonal vectors |j〉, j=1,…,d, spanning a d-dimensional Hilbert space $H$. For the moment, we will ignore any internal degrees of freedom of the particle, and any other spatial degrees, so that the entire Hilbert space describing the system is $H$, and a pure state inside the interferometer can be written as $|ψ⟩=∑jcj|j⟩$, and a general mixed state as

$ρ=∑j,k=1dρjk|j⟩⟨k|.$2.1
An interferometric experiment (figure 1) has two distinct components. The first consists of local phase shifts αj that can be inserted into the paths, implementing a diagonal phase unitary
$U(α)=∑jeiαj|j⟩⟨j|,$2.2
so that the state becomes
$ρ(α)=U(α)ρU(α)†=∑j,k=1dei(αj−αk)ρjk|j⟩⟨k|.$2.3
The second is a detector at the output, often simply fixed as the combination of a symmetric beam splitter with a path measurement, but for us a general POVM M=(Mω), with outcomes ω from a suitable space Ω.

The experimenter, having chosen α=(α1,…,αd), will observe outcomes ωΩ sampled from the Born distribution (the ‘interference pattern’):

$PM|ρ(ω|α)=Tr U(α)ρU(α)†Mω.$2.4
The signature of interference in such an experiment, where ρ is given and fixed, is that the distribution P=PM|ρ can vary as a function of the phases αj. The degree of variability, intuitively, is the visibility of the interference pattern, calling for a visibility functional V =V [P] on conditional distributions P(ω | α).

While it is always dangerous to make a priori demands, we take it that such a functional has to capture the global property of P not being constant; i.e. it should be 0 for constant P(⋅ | α) and positive otherwise. It will also make sense to ask that it is invariant under permutations and shifts of the α, reflecting the obvious symmetries of the experimental set-up (figure 1). We call a visibility functional V [P] satisfying these requirements regular. In the discussion of interferometers, specifically in the rich literature on the complementarity between fringe visibility and which-path information [59], to cite only the principal ones, the topic of visibility has been addressed repeatedly, and from increasingly general perspectives. In particular, the realization that, for d>2, no unique visibility functional seems to exist called for an axiomatic approach to put order into the many ad hoc parameters (cf. [1012]). We wish to highlight especially Coles’ paper [13], which makes an eloquent case for an operational approach, where visibility (as well as which-path information) is expressed as a property of an observable probability distribution, and with which philosophy we feel very much in line.

In the simplest case of the well-known Mach–Zehnder interferometer (figure 2), i.e. d=2, we observe interference fringes w.r.t. a relative phase shift: consider the density matrix ρ=ρ11|1〉〈1|+ρ22|2〉〈2|+ρ12|1〉〈2|+ρ21|2〉〈1|, the diagonal phase unitary U(α)=eiα1|1〉〈1|+eiα2|2〉〈2|, and a measurement with POVM elements |μ〉〈μ|, |μ〉=μ1|1〉+μ2|2〉. Now, with α=α1α2, and writing $ρ12=|ρ12| eiβ$, $μ¯1μ2=|μ1μ2| eiγ$, the output probability is

$PM|ρ(μ | α)=ρ11|μ1|2+ρ22|μ2|2+2|ρ12μ1μ2|cos⁡(α+β+γ),$2.5
whose fluctuation is essentially characterized by the coefficient |ρ12μ1μ2|, and so most analyses conclude this to be the visibility.

### 3. Optimal visibility as a measure of coherence

If we want to treat the state ρ as a resource, i.e. as a given, of which we are supposed to make the best, it makes sense to optimize the visibility V [PM | ρ] over all possible measurements. This idea diverges somewhat from laboratory practice in interferometry and from many discussion of visibility versus which-path information duality, where a fixed measurement is used, usually one mixing the paths uniformly (a 50–50 beam splitter in the Mach–Zehnder case, and more generally a transformation acting as a Fourier transform, followed by a channel detector) [7,11,14,15].

However, it is clear that the most beautiful coherent superposition in the state may be rendered invisible by an unsuitable choice of measurements. For instance, consider the qutrit state $ρ=13(|1⟩+|2⟩)(⟨1|+⟨2|)+13|3⟩⟨3|,$ under a measurement M(0) in the basis ${|1⟩,(1/2)(|2⟩±|3⟩)}$; evidently, the three outcomes all have probability $13$, irrespective of the phases in

$ρ(α)=13(|1⟩+eiα|2⟩)(⟨1|+e−iα⟨2|)+13|3⟩⟨3|.$
Intuitively, we expect the best choice to bring out the coherence in ρ to be the projective measurement M(1) in the basis ${(1/2)(|1⟩±|2⟩),|3⟩}$, for which the detection probabilities are $(23cos2⁡(α/2),23sin2⁡(α/2),13)$. On the other hand, the standard choice of a symmetric beam splitter results in the Fourier basis ${(1/3)(|1⟩+ζt|2⟩+ζ2t|3⟩):t=0,1,2}$, where ζ=e2πi/3, with detection probabilities
$(19+49cos2(α2−tπ3):t=0,1,2),$
which has the same oscillation pattern as M(1), but smaller amplitude.

Thus, we are motivated, given a visibility functional V [P], to optimize the visibility over all measurements, to get the best out of ρ. This leads to a number that now depends only on the state,

$CV(ρ):=supM=(Mω)V[PM|ρ].$3.1

The hypothesis that we will explore in the rest of the paper is that this number, for a large class of visibility functionals, is a good indicator of coherence in ρ.

In an attempt to identify consistent quantifiers of coherent superposition, Baumgratz et al. [1], following Åberg [2], have created a resource theory of coherence, with carefully chosen resource free state (diagonal density operators Δ) and free transformations (the so-called incoherent operations (IO)). A measure that is non-increasing under these operations is called a monotone. Mathematically, incoherent operations are completely positive and trace-preserving linear maps, built from incoherent Kraus operators, $T(ρ)=∑λKλρKλ†,$ where Kλ|j〉∝|k〉=|k(λ,j)〉 for all computational basis elements |j〉. If both Kλ and $Kλ†$ are incoherent, it is called strictly incoherent (SIO), and so is a map T built from strictly incoherent Kraus operators. From this, it is straightforward to see that a Kraus operator is incoherent if and only if it has the form $K=∑jcj|k(j)⟩⟨j|$ with a function k(j) mapping basis states to basis states; it is strictly incoherent if and only if k is one-to-one.

A functional C(ρ)≥0 is a monotone if C(ρ)≥C(T(ρ)) under all IO (SIO) T. It is called a strong monotone if, for an incoherent Kraus decomposition, $KλρKλ†=:qλρλ$, $C(ρ)≥∑λqλC(ρλ)$. Well-known examples include the ℓ1-measure of coherence [1], and the relative entropy of coherence [1,2,16]:

$Cℓ1(ρ)=∑j≠k|ρjk|$3.2
and
$Cr(ρ)=minσ∈ΔD(ρ ∥ σ)=S(Δ(ρ))−S(ρ),$3.3
with the relative entropy $D(ρ ∥ σ)=Tr ρ(log⁡ρ−log⁡σ)$ and the von Neumann entropy $S(ρ)=−Tr ρlog⁡ρ$; $Δ(ρ)=∑j|j⟩⟨j|ρ|j⟩⟨j|$ is the diagonal part of ρ.

Our first result is a general link between visibility and coherence. We call a visibility functional weakly affine if, for distributions Pi(ω | α), ωΩi (assuming w.l.o.g. pairwise disjoint Ωi), and for a probability distribution (qi), we have $V[P¯]=∑iqiV[Pi]$, with the averaged distribution $P¯=∑iqiPi$ on $Ω=⋃iΩi$.

### Theorem 3.1

For any regular and weakly affine visibility functional V [P], CVis a coherence measure that is strongly monotonic under strictly incoherent operations (SIOs). If V is convex in P, then CVis convex in ρ.

### Proof.

Let an SIO with Kraus operators Kλ be given, acting on a state ρ, so that $qλρλ=KλρKλ†$ defines the probability of the event λ and the post-measurement state. Observe that, because Kλ=πλDλ can be written as a diagonal matrix Dλ followed by a permutation πλ,

$qλU(α)ρλU(α)†=U(α)KλρKλ†U(α)†=KλU(β)ρU(β)†Kλ†,$
with βj=απλ(j). This shows that the probability of seeing outcome λ is qλ for all ρ(α).

Now choose measurements M(λ) for each ρλ, taking values ω in the disjoint sets Ωλ, subject to the probability law Pλ=PM(λ)|ρλ given by

$Pλ(ω|α)=Tr U(α)ρλU(α)†Mω(λ)=1qλTr U(α)KλρKλ†U(α)†Mω(λ)=1qλTr KλU(β)ρU(β)†Kλ†Mω(λ)=1qλTr U(β)ρU(β)†Kλ†Mω(λ)Kλ.$
Introducing the POVM $M~=(Kλ†Mω(λ)Kλ)λ,ω$ with outcomes (λ,ω), we can now invoke weak affinity:
$∑λqλV[Pλ]=V[∑λqλPλ]=V[PM~|ρ]≤CV(ρ),$
because the measurement $M~$ is eligible for ρ but may be suboptimal. As the measurements M(λ) can be chosen to maximize the left-hand side, we obtain
$∑λqλCV(ρλ)≤CV(ρ).$

For the convexity statement, let $ρ=∑ipiσi$ and choose any measurement M on ρ. Then,

$V[PM|ρ]=V[∑ipiPM|σi]≤∑ipiV[PM|σi]≤∑ipiCV(σi),$
and because M may be chosen to maximize the left-hand side, we find $CV(∑ipiσi)≤∑ipiCV(σi),$ as claimed. □

A couple of remarks are in order: first, there do not seem to be easy conditions for CV to be a (strong) coherence monotone under IO, but of course that is something potentially checkable in individual cases. Second, one might wonder in case we merely want to detect coherence, whether there is a universal measurement M such that if ρ has coherences, then V [PM | ρ] is positive. The answer is yes, namely any tomographically complete measurement, as long as V [P] has the property that it is non-zero on every non-constant P.

### 4. Examples

We now show that the above theory is not just an abstract construction, by considering several concrete visibility parameters, for which we can evaluate the associated coherence measures, or at least considerably simplify the optimization.

#### (a) Largest difference of intensity

The perhaps simplest and most intuitive parameter of visibility for two-outcome measurements M=(M0,M1=1 −M0) is the difference between the largest and the smallest value of PM | ρ(0 | α)=Tr U(α)ρU(α)M0. To make it suitable for measurements with arbitrary outcome sets, we define

$Vmax[P]:=supα,β12∥P(⋅|α)−P(⋅|β)∥1.$4.1
Note that we do not normalize by the sum of the largest and smallest probability, as is customary in discussions of visibility in classical interferometry, where the basic observable quantities are intensities. There, this appears necessary to obtain a dimensionless visibility; here, however, we have the probabilities that are already dimensionless and have an absolute meaning.

Clearly, $Vmax$ is regular and weakly affine, so the corresponding coherence measure $Cmax$ is an SIO monotone. In fact, it is easy to evaluate it, and the result is

$Cmax(ρ)=maxα12∥U(α)ρU(α)†−ρ∥1=maxα12∥[ρ,U(α)]∥1=maxα,M0 Tr U(α)ρU(α)†M0−Tr ρM0,$4.2
because we can always shift β to 0 by applying U(−β). In particular, the optimal measurement is a two-outcome POVM (M0,M1=1−M0), and the value is the largest difference in response probability over POVM elements.

We can compare the result with the trace distance measure of coherence, $CTr(ρ)=minσ∈Δ12∥ρ−σ∥1$, introduced in [1]: $CTr(ρ)≤Cmax(ρ)≤2CTr(ρ)$.

Namely, on the one hand, for σΔ, we have ∥ρσ1=∥U(α)ρU(α)σ1, and hence by the triangle inequality,

$∥U(α)ρU(α)†−ρ∥1≤∥U(α)ρU(α)†−σ∥1+∥ρ−σ∥1,$
which implies $Cmax(ρ)≤2CTr(ρ)$. On the other hand,
$CTr(ρ)≤12∥ρ−Δ(ρ)∥1=12∥ρ−∫dα U(α)ρU(α)†∥1≤∫dα12∥U(α)ρU(α)†−ρ∥1≤Cmax(ρ).$

In the qubit case, it holds that $Cmax(ρ)=2|ρ01|=Cℓ1(ρ)=2CTr(ρ)$ (see appendix A).

#### (b) Estimating equidistributed phases

Inspired by the previous example, we are motivated to consider guessing problems of a more general kind, where we are trying to estimate the true setting of the phases among several alternatives, based on measurement outcomes. It turns out that a good candidate is the equidistributed set of d phases (2πj/d)(1,2,…,d), j=1,…,d, and its shifts and permutations:

$Vguess[P]:=−1d+maxα0, π∈SdΩj∩Ωk=∅1d∑j=1dP(Ωj | α0+jhπ),$4.3
where hπ=(2π/d)(π(1),π(2),…,π(d)) is a generating vector of uniformly accelerating phases (w.r.t. the permutation π of coordinates). This quantity is the bias (excess over 1/d) of the optimal strategy to guess the true value of j∈{1,…,d} that defines the phase settings. As defined, this visibility functional is regular and weakly affine, so the corresponding Cguess is a coherence monotone under SIO. As a matter of fact, it holds [17]
4.4
for any α0 and any permutation π. Here, CR denotes the robustness of coherence, defined via
4.5
which is known to be an IO monotone [17]. Interestingly, by maximizing the operational visibility proposed in [13], eqn (20), the same result is obtained (P. J. Coles 2017, personal communication).

In the qubit case, it is well known that the robustness of coherence equals the ℓ1-measure: CR(ρ)=2|ρ01|=C1(ρ) [17], and so Cguess(ρ)=|ρ01| is just half of that.

#### (c) Largest sensitivity to phase changes

Looking back at example A, we note that the points of largest and smallest value of the response probability I(α)=PM | ρ(0 | α)=Tr U(α)ρU(α)M0 to a POVM element M0 may be quite far apart. By contrast, in many applications of interferometry it is a relatively small phase difference that we want to pick up [18], so we are interested in the largest magnitude of the derivative of I(α):

$V∇[P]:=maxα,h|∂I∂h(α)|,$4.6
where α ranges over all phases, and h over all direction vectors that are suitably norm bounded. To extend V to general measurements, we may include a maximization over all two-outcome coarse grainings. We can easily see that V[P] is regular and weakly affine because I(α) is a well-defined probability distribution over α.

Now, as I(α)=Tr ρU(α)M0U(α), its derivative at (w.l.o.g.) 0 in direction h is given by

$∂I∂h(0)=−i Tr [ρ,H]M0=−i Tr ρ[H,M0],$4.7
where H is the diagonal Hamiltonian with eigenvalues hj, H=diag(h). Note that the derivative at any other point α0 is the same, up to conjugating the measurement by U(α0). There are two natural limitations on h: Geometrically, to obtain the largest gradient of I, we should consider unit vectors h, meaning $∥H∥22=Tr H2=1$; or taking motivation from the Hamiltonian, we should bound its energy range, meaning $∥H∥∞≤1$. We denote these two scenarios by p=2 and $∞$, giving rise to two coherence measures $C∇(p)$. From equation (4.7), we directly get
4.8
Inspecting this formula, we see that the optimization is convex in H, hence the maximum is attained on an extremal admissible Hamiltonian. For p=2, these have the form $H=∑jϵjtj|j⟩⟨j|$, with ϵj=±1 and $∑jtj=1$. For $p=∞$, the extremal H have entries ±1 along the diagonal, and so
$C∇(∞)(ρ)=maxS+∪.S−=[d]2∥Π+ρΠ−∥1,$4.9
where the maximization over partitions S+∪.S=[d], with $Π∙=∑j∈S∙|j⟩⟨j|$, •=±. In both cases, we obtain a strong SIO monotone, due to the evident weak affinity of V. From equation (4.2), we see that $C∇(∞)≤Cmax$, but equality does not seem to hold in general.

An alternate form of C can be obtained by using equation (4.7), and going to the more convenient variable B=2M0−1 in the above equations. After a few manipulations, we arrive at

4.10
where the maximization is over the set of Hermitian matrices
In this form, it is formally a convex optimization problem, because we may as well go to the convex hull of $Cp$. However, its characterization remains as a beautiful open problem. Indeed, it is easy to see that the elements of $Cp$ have zero diagonal and satisfy ∥Xp=(Tr |X|p)1/p≤1, but there may be other constraints.

Once again, the qubit case is very simple (see appendix A): $C∇(2)(ρ)=2|ρ12|$ and $C∇(∞)(ρ)=2|ρ12|=Cℓ1(ρ)$.

#### (d) Largest Fisher information

Considering further the previous example, we realize that finding the largest derivative of the probability P(0 | α), while strongly motivated by the intuition rooted in intensities, does not necessarily identify the point of strongest statistical sensitivity, which is asking for the largest Fisher information, the natural measure for probability distributions. Looking again at directional estimation of a one-dimensional subfamily α=th+α0, $t∈R$, the Fisher information is given by the expected squared logarithmic derivative of the probability distribution:

$Fα0(h)=∑ω∈ΩP(ω|α0)(dln⁡P(ω|α)dt|t=0)2=∑ω∈Ω1P(ω|α0)(dP(ω|α)dt|t=0)2,$4.11
so we are considering the visibility functional
$VF[P]:=maxα0,hFα0(h),$4.12
where α0 varies over the whole space of phases, and h over a suitably bounded set of directions. Clearly, VF is regular and weakly affine.

The formula for the Fisher information, optimized over measurements (and α0, which w.l.o.g. is 0, by the same reasoning as in previous examples), for estimating t≈0 in e−itHρeitH for a given diagonal Hamiltonian H=diag(h) and $ρ=∑jλj|ej⟩⟨ej|$ is known [19,20] and given by

$Fopt(h)=2∑jk(λj−λk)2λj+λk|⟨ej|H|ek⟩|2.$4.13

As in the previous example on sensitivity, there are two natural domains of diagonal Hamiltonians H over which to optimize this: Either ∥H2≤1 or $∥H∥∞≤1$, leading to two variants $CF(2)(ρ)$ and $CF(∞)(ρ)$ of the coherence measure.

In either case, the optimal choice of H is extremal subject to the convex constraint, because $F$ can easily be seen to be convex in H. Namely, each term |〈ej|H|ek〉| is convex, hence also its square, and the coefficient in front of it manifestly non-negative. Thus, we obtain

$CF(2)(ρ)=max∑jtj=1ϵj=±1∑jk2(λj−λk)2λj+λk|⟨ej|(∑jϵjtj|j⟩⟨j|j)|ek⟩|2$4.14
and
$CF(∞)(ρ)=maxS+,S−⊂[d]∑jk2(λj−λk)2λj+λk|⟨ej|(Π+−Π−)|ek⟩|2,$4.15
where the first maximization is over diagonal Hamiltonians with Hilbert-Schmidt norm 1; the second over partitions S+∪.S=[d], with $Π∙=∑j∈S∙|j⟩⟨j|$, •=±, so that H=Π+Π.

For a qubit state ρ, it can be verified (see appendix A) that $CF(2)(ρ)=4|ρ12|2=Cℓ1(ρ)2$, $CF(∞)(ρ)=2Cℓ1(ρ)2$.

#### (e) Largest differential Chernoff bound

We observe that the attainability of the Fisher information presupposes access to many copies of the state and independent measurements, in which setting the Fisher information gives the optimal scaling of the mean squared estimation error with the number of copies. If we allow general collective measurements and at the same time only want to distinguish pairs of nearby states optimally, we are led to the differential Chernoff bound [21]: while the Chernoff bound is defined as $ξ(ρ,σ)=sup0≤s≤1−lnTr ρsσ1−s$, for states and probability distributions alike [21,22], it is known that (1/dt2)d2ξ(P(⋅|α0),P(⋅|α0+dth))=:dhξ2 defines the line element of a Riemannian metric on the parameter space. Thus, we let

$V∂ξ[P]:=maxα0,hdhξ2.$4.16
As Vξ is regular and weakly affine, we will obtain a strong SIO monotone. Note that this would not work simply fixing a Hamiltonian, as shown in [23,24].

The differential Chernoff bound, optimized over measurements, for distinguishing e−itHρeitH for t≈0 from ρ in the many-copy regime, with a diagonal Hamiltonian H and $ρ=∑jλj|ej⟩⟨ej|$ is again known [21], and given by dHξ2=(1/dt2)d2ξ(ρ, e−itHρeitH), which evaluates to

$dHξ2=12∑jk(λj−λk)2|⟨ej|H|ek⟩|2=12∑jk(λj+λk−2λjλk)|⟨ej|H|ek⟩|2=Tr ρH2−Tr ρHρH=−12Tr [ρ,H]2,$4.17
the latter equalling the Wigner–Yanase skew information, IWY(ρ,H) [25].

As in the previous two examples, there are two natural domains of diagonal Hamiltonians H over which to optimize this: either ∥H2≤1 or $∥H∥∞≤1$, leading to two variants $C∂ξ(2)(ρ)$ and $C∂ξ(∞)(ρ)$ of the coherence measure. Again, dHξ2 is convex in H, owing to convexity of each term |〈ej|H|ek〉|2, and $(λj−λk)2≥0$. Consequently, the optimal H is extremal under the convex norm constraint. For $p=∞$, this means that the maximum is attained on a difference of two diagonal projectors, H=Π+Π. For p=2, however, we can say something even better, using Lieb’s concavity theorem [26], which says that for semidefinite H, the Wigner–Yanase skew information is convex in H2, by writing $H=H2$. In general, we split H=H+H into positive and negative parts, and find after some straightforward algebra that

$IWY(ρ,H)=IWY(ρ,H+)+IWY(ρ,H−)−2 Tr ρH+ρH−,$
which by Lieb’s theorem [26] is jointly convex in H2+ and H2. Thus, we find that the optimal H+ and H must be proportional to rank-one projectors, resulting in the expression claimed for $C∂ξ(2)(ρ)$.
$C∂ξ(2)(ρ)=maxj,k,t IWY(ρ,t|j⟩⟨j|−1−t|k⟩⟨k|)$4.18
and
$C∂ξ(∞)(ρ)=maxS+,S−⊂[d]IWY(ρ,Π+−Π−)=maxS+∪.S−=[d]4 Tr ρΠ+ρΠ−,$4.19
where the first maximization is over distinct basis states j,k∈[d] and 0≤t≤1; the second over disjoint subsets S+ and S of [d], with $Π∙=∑j∈S∙|j⟩⟨j|$, •=±.

For a qubit state ρ, we find (see appendix) that $C∂ξ(2)(ρ)=2|(ρ)12|2$ and $C∂ξ(∞)(ρ)=4|(ρ)12|2$.

#### (f) Largest Shannon information

The previous examples should have prepared us for thinking of visibility as an expression of how much information about α the output distribution P(⋅|α) reveals. So why not take this to the logical conclusion? Noting that P is a channel from multi-phases α to outputs ω, in the Shannon theoretic sense, we are motivated to define visibility as the Shannon capacity of P:

$VI[P]:=C(P)=supμI(α:ω),$4.20
where μ is a probability measure on the α, defining a joint distribution μ(α)P(ω|α) of channel inputs and outputs, and $I(X:Y)=D(PXY∥PX×PY)$ is the mutual information of two random variables [27]. It can be checked that VI is regular and weakly affine. Operationally, VI[P] is the largest communication rate that can be transmitted by a sender, who may encode information into the phase settings α(1),…,α(n) of asymptotically many interferometers, to a receiver who decodes the correct message with high probability based on the observations ω1,…,ωn [27].

To obtain CI(ρ), we then only need to perform a maximization of the Shannon capacity over all measurements:

$CI(ρ)=sup(Mω)C(PM|ρ)=supμsup(Mω)I(α:ω)=supμIacc({μ(α),ρ(α)}),$4.21
where the latter quantity is known as the accessible information. These optimizations are by no means easy, and are worked out only in some few cases. In any case, theorem 3.1 shows that CI is a SIO monotone. This might provide some motivation to try to evaluate CI in certain special cases.

However, due to the Holevo bound [28], and the Holevo–Schumacher–Westmoreland theorem [29,30] regarding the capacity of the cq-channel αρ(α), we obtain the following:

$CI(ρ)≤S(Δ(ρ))−S(ρ)=Cr(ρ)=supn1nCI(ρ⊗n).$4.22

Namely, the Holevo bound [28] upper-bounds the accessible information,

$Iacc({μ(α),ρ(α)})≤χ({μ(α),ρ(α)}):=S(∫μ(dα)ρ(α))−∫μ(dα)S(ρ(α)).$
Here, the second term is always S(ρ) because the ρ(α) are unitarily rotated versions of ρ, and the first term is maximized by the uniform distribution over all phases:
$CI(ρ)≤S(Δ(ρ))−S(ρ)=Cr(ρ),$4.23
with the well-known relative entropy of coherence [1,2]. Note that the latter is known to be a monotone under IO, and even under the still larger class of maximally incoherent operations (MIOs) [3].

Invoking the Holevo–Schumacher–Westmoreland theorem [29,30] regarding the capacity of the cq-channel αρ(α), we get furthermore $supn(1/n)CI(ρ⊗n)=Cr(ρ)$.

In the qubit case, the optimization (4.21) seems to be unknown, but we believe that the maximum is attained on the binary ensemble ${(12,ρ0=ρ),(12,ρ1=σzρσz)}$, and the measurement in the eigenbasis of ρ0ρ1, which would yield $CI(ρ)=1−H((1±2|ρ12|)/2)≈(2/ln⁡2)|ρ12|2$. On the other hand, Cr(ρ)=H((1±Tr ρσZ)/2)−H((1±r)/2).

### 5. Discussion

Using a simple model of multi-path interferometry and a broad approach to visibility of an experimental set-up of phase modulation and detection, we showed that the concept of coherence of a state can be obtained by optimizing the visibility over detection schemes. We illustrated our approach by analysing specific visibility functionals. The results are clearest in the two-level case, corresponding to Mach–Zehnder interferometers, where we find that the single off-diagonal density matrix element governs almost all visibility and coherence effects. In settings with more paths, as should be expected, there are different inequivalent ways of quantifying visibility and correspondingly many different, incomparable coherence measures.

Our discussion shows that it is possible to link coherence theory, a priori quite an abstract enterprize, to operational notions in the physics of interferometers. We hope that our present approach will be fruitful in the future to develop a firm physical foundation of the resource theory of coherence. As an example of this kind of impact, we highlight theorem 3.1, which shows that visibility-based coherence measures are naturally monotone under strictly incoherent operations (SIOs), while it is an open question whether this holds also under the originally proposed incoherent operations (IOs); this might be construed as favouring SIOs over IOs as the ‘correct’ class of operations. See also Yadin et al. [31], where it is shown that SIOs are obtained precisely as the class of cptp maps that can be dilated in a specific incoherent way onto an extended system $H⊗S$, where the ‘internal’ or ‘spin’ degrees of freedom of the particle are thought of as having no incoherence structure, so that in Åberg’s framework [2] the incoherent subspaces are $|j⟩⊗S$. Namely, a cptp map is strictly incoherent if and only if it can be decomposed into attaching an ancillary state of $S$, followed by an incoherent unitary on the tensor product space, i.e. one mapping the subspaces $|j⟩⊗S$ into each other, followed by a destructive measurement of $S$ with outcomes λ.

Unlike other investigations that have tried to build a similar link between visibility and coherence, we start from visibility parameters as a feature of experimentally accessible data, rather than declaring known coherence measures as ‘visibility’ [32,33]. Because of this, we think of our approach as operational, in contrast to the cited works whose approach could be characterized as axiomatic. In this respect, we believe that our present work goes some way towards answering the call for an operational justification of coherence as a visibility parameter [13], sec. VII. It is tempting to conjecture that all the coherence parameters derived from ‘reasonable’ visibility functionals satisfy duality relations with suitable path information measures such as in the mentioned works. Whether visibility as conceptualized by us is always dual to a path information or some other parameter is a question we have to leave open at this point. In any case, our analysis of some concrete examples of visibility functionals on interference fringes has bolstered this connection to coherence, resulting in coherence measures that can be related to, and in some cases identified with, previously considered measures.

We also think that the present treatment gives some insight into the relationship between coherence and the resource theory of asymmetry (or reference frames) for the group of time translations (cf. [4]). Namely, looking at Examples C, D and E, each of the resulting coherence measures is obtained by maximizing, over a bounded set of diagonal Hamiltonians, a function given in equations (4.8), (4.13) and (4.17), respectively. It is known that, for a fixed Hamiltonian H, each of them is an monotone in the resource theory of time asymmetry corresponding to energy conservation; see, for instance, [23] for the latter quantity. Hence, we are led to think of coherence theory as asymmetry theory with a Hamiltonian that has fixed eigenvectors but ‘undetermined’ eigenvalues. This may go some way towards explaining the characteristic similarities and differences between (time) asymmetry and coherence.

In the analysis, we encountered some interesting mathematical problems, too, among them the characterization of the set of all commutators of norm-bounded diagonal and general Hermitian matrices. Furthermore, we would like to know whether, among the established coherence monotones, we can recover the ℓ1-measure C1 [1], or the coherence of formation Cf [2,16] directly via visibilities? In the light of [32,33], the former would be especially interesting.

Finally, going beyond the single-particle interference of our above theory, the present study suggests multi-particle interference as a natural extension. This will not only provide a framework for the compositions of systems (cf. Åberg [2]), but also bring out the unique quantum features of interference, as opposed to the mere wave-mechanical ones in the single-particle case.

### Authors' contributions

All authors have contributed equally and crucially to the conception of the present paper, the execution of the scientific research and its writing. All authors gave their final approval for publication.

### Competing interests

We declare we have no competing interests.

### Funding

T.B. is supported by IISER Kolkata and acknowledges the hospitality of the Quantum Information Group (GIQ) at UAB during June–July 2016, when the present work was initiated. M.G.D. is supported by a doctoral studies fellowship of the Fundación ‘la Caixa’. A.W. is supported by the European Commission (STREP ‘RAQUEL’) and the ERC (Advanced Grant ‘IRQUAT’). The authors acknowledge furthermore funding by the Spanish MINECO (grant FIS2013-40627-P), with the support of FEDER funds, and by the Generalitat de Catalunya CIRIT, project 2014-SGR-966.

## Acknowledgements

It is our pleasure to thank Emili Bagan, Manabendra Bera, John Calsamiglia and Chang-Shui Yu for discussions on interferometers and visibility, and Patrick Coles for illuminating remarks on an earlier version of the manuscript.

## Appendix A. Qubit examples

(a) Cmax

As only the relative phase α=α1α2 matters, we see

$ρ−U(α)ρU(α)†=[0(1−e−iα)ρ12(1−e+iα)ρ210].$
Its trace norm clearly is maximized at α=π, showing $Cmax(ρ)=|ρ12|+|ρ21|=Cℓ1(ρ)$, which for qubits is known to equal 2CTr(ρ).

(b) C

For $p=∞$, the only non-trivial choice is Π+=|1〉〈1| and Π=|2〉〈2|, directly resulting in $C∇(∞)=2∥|1⟩⟨1|ρ|2⟩⟨2|∥1=2|ρ12|$.

For p=2, we have to consider the Hamiltonian $H=t|1⟩⟨1|±1−t|2⟩⟨2|$, yielding

$[ρ,H]=[0(−t±1−t)ρ12(t∓1−t)ρ210].$
Its trace norm is maximized for the negative sign choice and at $t=12$, and so $C∇(2)=2|ρ12|$

(c) CF

The formula for the coherence measure reduces to

$CF(p)(ρ)=max2(λ1−λ2)2λ1+λ2|⟨e1|H|e2⟩|2,$
where the maximization is over Hspan{1,σZ} such that ∥Hp≤1. Note that λ1+λ2=1 and |〈e1|H|e2〉|2=Tr H|e1〉〈e1|H|e2〉〈e2|.

This calculation is conveniently done in the Bloch picture, writing $ρ=12𝟙+r⋅σ)$, with a vector r=rr0 that we decompose as a product of its length r=|r| and a unit vector r0 (with components $rx0$, $ry0$ and $rz0$). In this way, the eigenprojectors of ρ become $|e1,2⟩⟨e1,2|=12(𝟙±r0⋅σ)$. In the above maximization, this allows us to identify λ1λ2=r and r2=2Tr ρ2−1.

For $p=∞$, we already know that H=σZ is optimal, so

$CF(∞)(ρ)=2r2 Tr σZ|e1⟩⟨e1|σZ|e2⟩⟨e2|=2r214 Tr(𝟙−rx0σX−ry0σZ+rz0σZ)⋅(𝟙−rx0σX−ry0σZ−rz0σZ)=r2(1+(rx0)2+(ry0)2−(rz0)2)=4(Tr ρ2−Tr Δ(ρ)2)=8|ρ12|2=2Cℓ1(ρ)2.$

For p=2, the maximization reduces to that of 2r2 Tr H|e1〉〈e1|H|e2〉〈e2|, with H=α1+βσZ and 2α2+2β2≤1. The trace decomposes into four terms; however, the three that contain a α1 evaluate to 0, leaving 2β2r2Tr σZ|e1〉〈e1|σZ|e2〉〈e2|, which yields (using the optimal choice 2β2=1) C(2)F(ρ)=2(Tr ρ2−Tr Δ(ρ)2)=C1(ρ)2.

(d) Cξ

For $p=∞$, the only non-trivial choice is Π+=|1〉〈1| and Π=|2〉〈2|, directly resulting in $C∂ξ(∞)=4Tr|1⟩⟨1|ρ|2⟩⟨2|ρ=4|(ρ)12|2$.

For p=2, we have to consider the Hamiltonian $H=t|1⟩⟨1|−1−t|2⟩⟨2|$, yielding

$[ρ,H]=[0(−t−1−t)(ρ)12(t+1−t)(ρ)210].$
Thus, $IWY(ρ,H)=(t+1−t)2|(ρ)12|2$, which is maximized at $t=12$, hence $C∂ξ(2)=2|(ρ)12|2$.