A no-go theorem for theories that decohere to quantum mechanics

To date, there has been no experimental evidence that invalidates quantum theory. Yet it may only be an effective description of the world, in the same way that classical physics is an effective description of the quantum world. We ask whether there exists an operationally defined theory superseding quantum theory, but which reduces to it via a decoherence-like mechanism. We prove that no such post-quantum theory exists if it is demanded that it satisfy two natural physical principles: causality and purification. Causality formalizes the statement that information propagates from present to future, and purification that each state of incomplete information arises in an essentially unique way due to lack of information about an environment. Hence, our result can be viewed either as evidence that the fundamental theory of Nature is quantum or as showing in a rigorous manner that any post-quantum theory must abandon causality, purification or both.


Introduction
In 1903, Michelson wrote 'The more important fundamental laws and facts of physical science have all been discovered, and these are so firmly established that the possibility of their ever being supplanted in consequence of new discoveries is exceedingly remote' [1]. Within 2 years, Einstein had proposed the photoelectric effect [2] and within 30 years quantum theory was an established field of scientific research. This new science revolutionized our understanding of the physical world and brought with it a host of classically counterintuitive features such as superposition, entanglement and fundamental uncertainty.
Today, quantum theory has been verified to extremely high precision and forms the basis of a vast array of new technologies. Yet, just as for Michelson, it may turn out to be the case that quantum theory is only an effective description of our world. There may be some more fundamental theory yet to be discovered that is as radical a departure from quantum theory as quantum was from classical. If such a theory exists, there should be some mechanism by which effects of this theory are suppressed, explaining why quantum theory is a good effective description of Nature. This would be analogous to decoherence, which both suppresses quantum effects and gives rise to the classical world [3][4][5]. As such, this mechanism is called hyperdecoherence. To the best of the authors' knowledge, the notion of hyperdecoherence was first discussed in [6] and has commonly been considered as a mechanism to explain why we do not observed post-quantum effects, such as in [7], and, in particular, in the context of higher-order interference [8][9][10][11][12][13][14][15][16][17][18][19].
We formalize such a hyperdecoherence mechanism within a broad framework of operationally defined physical theories by generalizing the key features of quantum to classical decoherence. Using this, we prove a no-go result: there is no operationally defined theory that satisfies two natural physical principles, causality and purification, and which reduces to quantum theory via a hyperdecoherence mechanism. Here, causality formalizes the statement that information propagates from present to future, and purification that each state of incomplete information arises in an essentially unique way due to a lack of information about some larger environment system. In a sense, purification can be thought of as a statement of 'information conservation'; any missing information about the state of a given system can always be accounted for by considering it as part of a larger system. Our result can be viewed either as a justification of why the fundamental theory of Nature is quantum or as highlighting the ways in which any post-quantum theory must radically depart from a quantum description of the world.

Decoherence
One of the standard descriptions of the quantum to classical transition is environment-induced decoherence [4]. 1 In this description, a quantum system interacts deterministically with some environment system, after which the environment is discarded, leading to a loss of information. This procedure formalizes the idea of a quantum system irretrievably losing information to an environment, leading to an effective classical description of the decohered system. The decoherence process can be viewed as inducing a completely positive trace preserving map on the original quantum system, which is termed the decoherence map.
A concrete example serves to illustrate the key features of this map. Consider the following reversible interaction with an environment: U = i |i i| ⊗ π i , where {|i } is the computational basis and π i is a unitary which acts on the environment system as π i |0 = |i , ∀i. Switching to the density matrix formalism, the decoherence map arising from the above interaction corresponds to where ρ is the input state. Hence, in this concrete setting, the decoherence map D is a de-phasing map.
It is clear that D(ρ) will always be diagonal in the {|i } basis, regardless of the input. Hence, as they have no coherences between distinct elements of {|i }, the states D(ρ) correspond to classical probability distributions. In fact, the entirety of classical probability theory-corresponding to probability distributions over classical outcomes, stochastic maps acting on said distributions and measurements allowing one to infer the probabilities of different possible outcomes-can be seen to arise from quantum theory by applying D to density matrices ρ as D(ρ), completely positive trace preserving maps E as D(E(D(_))), and positive operator valued measurement (POVM) elements M as Tr(MD(_)). In this manner, one can consider the classical probability theory to be a sub-theory of quantum theory-meaning that applying stochastic maps to the probability distributions results in probability distributions-where D is the map restricting quantum theory to the classical sub-theory. The statement of the previous line is encompassed by what is meant by 'sub-theory'; as a sub-theory is itself a theory, it must be closed under composition.
There are three key features of the decoherence map that we will use to define our hyperdecoherence map in §4: (i) It is trace preserving, corresponding to the fact that it is a deterministic process.
This corresponds to the intuitive fact that classical systems have no more coherence 'to lose' and, moreover, once states have lost their coherence they are left invariant by further decoherence. (iii) Finally, we observe that decoherence arises from an irretrievable loss of information to an environment, and so: (a) If D(ρ) is a pure classical state, i.e. D(ρ) = |i i| for some i, then ρ is clearly also a pure quantum state. That is, if the state that results from this loss of information is a state of maximal information, then no information can have been lost to the environment.
is clearly also a maximally mixed quantum state. That is, if the decohered state is maximally ignorant regarding the classical state, then it should be maximally ignorant about the quantum state.

Generalized theories
To make progress on the question raised at the start of this paper, we need to be able to describe theories other than quantum and classical in a consistent manner. This calls for a broad framework that can describe any conceivable-yet well-defined-physical theory. The framework we present here is based on [22][23][24][25] 2 and takes the view that, ultimately, any physical theory is going to be explored by experiments, and so must have an operational description in terms of these experiments.
Note that operationalism as a philosophical viewpoint, in which one asserts that there is no reality beyond laboratory device settings and outcomes, is not being espoused here. One should merely view the approach taken here as an operational methodology aimed at gaining insight into certain structural properties of physical theories. This operational approach is similar in spirit to that taken in device-independent quantum information processing-a field that has led to many practical applications [26,27]. 3 A theory in this framework can be described as a collection of processes, each of which corresponds to a particular outcome occurring in a single use of a piece of laboratory equipment in some experiment. In general, each process has some number of inputs and outputs. These inputs and outputs are collectively called systems. These systems are labelled by different types, denoted A, B, . . .. Given two systems of type A and B, we can form a composite system of type AB. Operationally, a process with input system of type AB corresponds to a single use of a piece of laboratory equipment with an input system of type A and a distinct input system of type B. In finite-dimensional quantum theory, systems correspond to complex Hilbert spaces, with the type given by the dimension of the Hilbert space. Hence a type A in quantum theory is just a natural number, that is, A ∈ N. Consider a qubit, which in our language corresponds to a quantum system of type 2. Physically, a qubit can be realized in many different ways, such as by a spin- 1 2 system or an ion in a trap with two distinct energy levels. Although these physical set-ups might differ, they are operationally equivalent. Hence, while the intuitive picture of a system corresponding to a particle 'passing from input to output port of a laboratory device' is appealing, one should take care that this intuitive idea does not lead to ambiguities.
Processes with no inputs are known as states-corresponding to density matrices in quantum theory; those with no outputs as effects-corresponding to POVM elements in quantum theory; and those having both inputs and outputs as transformations-corresponding to completely positive trace non-increasing maps in quantum theory.
The key feature of a theory in this framework is in how these processes compose. There are two equivalent ways to define this, diagrammatically or algebraically. Diagrammatically, we denote processes as labelled boxes and systems as labelled wires. Processes can then be wired together to form diagrams such as: This wiring together of processes must satisfy two conditions: firstly, system types must match, and, secondly, no cycles can be created. The relevant data for a particular diagram are just the connectivity: which outputs are connected to which inputs and the ordering of the free inputs and outputs. Any circuits formed in this way must also correspond to a valid process in the theory. That is, for all theories in this framework, processes must be closed under this composition. Hence the above diagram must correspond to a process in the theory, in this case one with input system of type A and output system of type B. One can think of the above diagram formed by connecting different processes as akin to circuits drawn in the field of quantum computation.
The equivalent algebraic statement formally corresponds to the fact that these systems and processes form the objects and morphisms of a strict symmetric monoidal category; see [22,23] for more information on the formal mathematical underpinnings of this. However, more intuitively, we can think of building the above diagrams out of two fundamental forms of composition: sequential and parallel. If e is a process from a system of type A to a system of type B and u is a process from a system of type B to a system of type C, then their sequential composition is a process from a system of type A to a system of type C, denoted u • e. Note that, to sequentially compose two processes, the type of the output system of the first process must match the type of the input system of the second. Similarly, if e is a process from a system of type A to a system of type B and u is a process from a system of type C to a system of type D, then their parallel composition is a process from the composite system of type AC to the composite system of type BD, denoted u ⊗ e. Note that the symbol ⊗-which schematically denotes parallel composition-may not correspond to the standard vector space tensor product.
The definition of a strict symmetric monoidal category is then merely a statement that these two forms of composition interact in the way that one would expect (e.g. [22,23,29]), for every process u, e, f , k with the property that the type of the output system of f (respectively, k) matches the type of the input system of u (respectively, e). Note that this is exactly what happens in quantum theory.
To illustrate the connection between the algebraic and diagrammatic representation, consider the above diagram translated into algebraic notation, (3.1) where, on the right, 1 corresponds to the identity process and ⊗ and • denote the parallel and sequential composition of processes, respectively. In what follows, the • will generally be suppressed. Algebraically, a process d from a system of type A to a system of type B is denoted d A B . If the output system is of the same type as the input system, then the indices will be suppressed to a subscript, rather than a subscript and a superscript. If there is no input/output system, the corresponding superscript/subscript is left blank. Note that, in the right-hand algebraic equation, a dummy index on the repeated type A had to be introduced as a book-keeping measure, despite the fact that A 1 and A 2 are the exact same type. Note the diagrammatic notation was able to deal with this issue without the need for a dummy index.
The following concrete example illustrates potential issues that can arise if one forgets that the dummy index is merely a book-keeping measure. Consider the quantum Bell state (1/d) ij |ii jj| in d 2 dimensions. As this is a maximally entangled two-qudit state, the type of each system is the same, namely the natural number d. However, in order to prevent ambiguity when marginalizing over one of the qudit systems, we introduce a dummy index on the type, as follows: where in the above ⊗ is the standard vector space tensor product. Clearly, marginalizing over the other system results in That is, each marginalized state is the same, despite the fact that these systems can be space-like separated. It is the mathematical assignment of a state to each system that is the same, not the physical set-up. We saw above that, in order to marginalize correctly using algebraic notation, a dummy index had to be introduced to specify the system on which to apply the partial trace. However, it was important to note that, after this procedure was completed, it was crucial to drop the dummy index. When the circuit representing the connections of processes in an experiment has no free inputs or outputs, we associate it with the probability that all of these processes occur when the experiment is run, for example There are two primitive experimental notions one would expect to be faithfully represented in any operationally defined theory. The first is tomography: if two processes give the same probabilities in all experiments, then they are the same process. That is, where X is any diagram which, when composed with f or g, has no free inputs and outputs. Both quantum and classical theory actually satisfy the stronger notion of local tomography where rather than quantifying over all X we need only consider X which are local state preparations and local effects. Note that this assumption is not made for theories considered here. The second is convexity: given a collection of processes with the same inputs and outputs, experimentally it is possible to implement a probabilistic mixture of these, by applying one conditioned on the outcome of a coin toss, for example. Hence a process corresponding to an arbitrary convex combination of processes, where p i defines a probability distribution (i.e. p i ∈ R + and i p i = 1), should exist in the theory. Convexity allows us to define purity of states. A state is pure if it is not a convex combination of other distinct states. A state is mixed if it can be written as a convex combination of distinct states. From the above requirements, it can be shown that the set of states, effects and transformations generate real vector spaces, with the effects and transformations acting linearly on the vector space of states [23].

Definition 3.1 (Operational theory).
A generalized theory consists of a collection of systems closed under parallel composition and processes closed under parallel and sequential composition, such that closed circuits formed from composing processes correspond to probability distributions. Moreover, these processes satisfy tomography and convexity as defined in equation (3.5) and equation (3.6), respectively.
In what follows, we will require our post-quantum theory to satisfy two natural physical principles, causality and purification, which were first introduced in [23]. A process is deterministic if the piece of laboratory equipment it corresponds to has only one possible outcome.

Definition 3.2 (Causality [23]). For each system of type A, there exists a unique deterministic effect, denoted algebraically as u A [_], and diagrammatically as
This may seem like a somewhat odd definition for causality; however, it can be shown to be equivalent to the statement that future measurement choices do not affect current experiments [23]. It also implies the no superluminal signalling principle [30] and provides a unique definition of marginalization for multi-system states. A process f : In quantum theory, the unique deterministic effect is provided by the (partial) trace, that is, in trace preserving. It can be shown for general theories [31] that both reversible and deterministic transformations are terminal.

Definition 3.3 (Purification [23]).
For every state on a given system of type A, there exists a pure bipartite state on some composite system of type AB, such that the original state arises as a marginalization of this pure bipartite state, Here, ψ is said to purify ρ. Moreover, any two pure states ψ and ψ on the same system which purify the same state are connected by a reversible transformation, If one considers a pure state to be a state of maximal information, the purification principle formalizes the statement that each state of incomplete information arises in an essentially unique way due to a lack of information about an environment. In a sense, purification can be thought of as a statement of 'information conservation'; any missing information about the state of a system can always be traced back to lack of information of some environment system. Or, more succinctly: information can only be discarded, not destroyed [32].
The purification principle, in conjunction with another natural principles, implies many quantum information processing [23] and computational primitives [11]. Examples include teleportation, no information without disturbance, no-bit commitment [23,33] and the existence of reversible controlled transformations [11]. Moreover, purification also leads to a well-defined notion of thermodynamics [31,34,35].
Some concrete examples of theories in this framework serve to illustrate the terminology introduced in this section. As mentioned at different points above, finite-dimensional quantum theory belongs to our framework. Systems are given by complex Hilbert spaces, with the type of each system corresponding to the dimension of the corresponding Hilbert space, which in our case will always be a natural number. States are density matrices-that is, positive semi-definite Hermitian operators of unit trace acting on the underlying Hilbert space; transformations are completely positive trace preserving maps; and effects are elements of POVMs. The real vector space generated by the set of density matrices is given by the real vector space of Hermitian operators, spanned by the density matrices. Parallel composition of states in quantum theory takes a particularly neat form: a joint state of a composite system is a positive operator acting on the standard vector space tensor product of the Hilbert spaces associated with the individual systems. In particular, bi-partite quantum states can always be written as a real linear combination of product states.
Quantum theory satisfies both causality and purification. Indeed, to illustrate purification, it is enough to note that every mixed state on a finite-dimensional system i p i |i i| can be purified to a state |ψ ψ|, where |ψ = i √ p i |i |i , by the introduction of a suitable extra system. Moreover, any other purification |φ must satisfy |ψ = (I ⊗ U) |φ with U a unitary transformation. Purification is standardly referred to by mathematicians as the Gelfand-Naimark-Segal construction [36]. The classical theory of finite-dimensional probability distributions and stochastic processes is also an example of a specific theory in this framework. A system is associated with a real vector space with the type corresponding to the dimension of said vector space, which can be thought of as the number of discrete outcomes of some test on that system. In this work, when 'classical theory' is mentioned, this is what we mean. Other interesting examples of generalized theories are the Spekkens toy model [37]; theories in which the set of states of a single system correspond to Euclidean hyperballs of dimension n [38,39] (the n = 3 case of such theories corresponds to the Bloch ball of quantum theory); the theory colloquially known as 'Boxworld' [40] containing states that exhibit Popescu-Rohrlich correlations which maximally violate the Clauser-Horne-Shimony-Holt inequality, while respecting the no superluminal signalling principle [41]; and a class of theories which each have the same pure states and reversible transformations as quantum theory, but with different Born rules, mixed states and measurements [42]. The existence of such alternative theories allows for an investigation of the structural and information-theoretic properties of theories where different physical principles may hold. Indeed, the information processing and computational power of these alternative theories can be studied in a systematic way [33,[43][44][45][46]. The ambition of such investigations is to provide a deep understanding of the connections between physical principles and information-theoretic advantages in a theory-independent manner, and to perhaps shed light on the infamous quantum computational 'speed-up' [47].
One might wonder whether quantum field theory provides an example of a theory in this framework. Indeed, this remains a subject of ongoing investigation; see, in particular, [48][49][50]. This issue is mathematical rather than conceptual; indeed, many authors take an operational point of view when working with quantum field theory, especially in the emerging field of relativistic quantum information [51,52].

Hyperdecoherence
In §2, the quantum to classical transition was modelled by a decoherence map restricting quantum systems to classical ones. We can analogously model a post-quantum to quantum transition with a hyperdecoherence map, represented algebraically as D and diagrammatically by , which restricts post-quantum systems-described by a generalized theory, definition 3.1 from §3-to quantum ones. 4 We now adopt the three key features of decoherence outlined at the end of §2 to this general setting, ending this section with a formal definition of a post-quantum theory.
(i) As in the quantum to classical transition, we think of this hyperdecoherence map as arising via some deterministic interaction with an environment system, after which the environment is discarded by marginalizing with the unique deterministic effect. Hence, as with standard decoherence, hyperdecoherence can be thought of as an irretrievable loss of information to an environment. As deterministic processes are terminal, the hyperdecoherence map should be terminal: This is the analogue of point (i) from the end of §2. (ii) Moreover, hyperdecohering twice should be the same as hyperdecohering once, as the hyperdecohered system has no more 'post-quantum coherence' to 'lose'. Hence this map should be idempotent: This is the analogue of point (ii) from the end of §2, where idempotence immediately followed from the fact that the decoherence map sends off-diagonal terms in the density matrix to zero, losing all quantum coherences in the process. A natural extension of quantum theory that has been considered is the theory of density cubes [55], where states are rank-3 tensors satisfying some positivity conditions, rather than rank-2 density matrices. In this case, one can identify the 'post-quantum coherences' as the elements with three distinct indices. Hyperdecoherence would then correspond to sending these terms to zero, resulting in standard density matrices [9]. Such a procedure would again clearly be idempotent. (iii)(a) One can define a notion of purity relative to the sub-theory constructed via the above procedure. A state from the sub-theory is pure in the sub-theory if it cannot be written as a convex combination of other states from the sub-theory. Note that a state which is pure in the sub-theory may not be pure in the full post-quantum theory, as a state that cannot be written as a convex combination of states from the sub-theory may turn out to be decomposable as a convex combination of states lying outside the sub-theory. As hyperdecoherence arises from an irretrievable loss of information to an environment, if a state resulting from this process is a state of maximal information, then no information can have been lost to the environment. We formalize this by demanding that pure states in the sub-theory are pure in the post-quantum theory. This is the analogue of point (iii)(a) from the end of §2. (iii)(b) Similarly, we can define the notion of a maximally mixed state relative to the sub-theory.
A state from the sub-theory is maximally mixed in the sub-theory if any state from the subtheory appears in some convex decomposition. Note that a state which is maximally mixed in the sub-theory may not be maximally mixed in the full theory. However, this would describe a rather odd situation where, under hyperdecoherence, the maximally mixed state from the full theory is mapped to a state containing more information. This is clearly in conflict with the idea that hyperdecoherence represents a loss of information to the environment. Hence, we demand that the state that is maximally mixed in the subtheory is maximally mixed in the full theory. This is the analogue of point (iii)(b) from the end of §2.
As was the case for classical theory in §2, one can construct the entirety of quantum theory as a sub-theory of the post-quantum theory by appropriately applying D to states, transformations and effects from the post-quantum theory. That is, density matrices, completely positive trace non-increasing maps and POVM elements correspond to respectively.
Hence-as D is idempotent-quantum states, transformations and effects are those left invariant by the hyperdecoherence map. Note that, as in the quantum to classical case, a sub-theory is itself a theory and so it must be closed under both sequential and parallel composition.
Point (iii)(a) will play an important role in our proof, so it is worth discussing in more detail here. Firstly, note that we need some assumption in addition to terminality and idempotence in order to capture a sensible notion of hyperdecoherence. Indeed, even to adequately capture the standard notion of decoherence, one needs constraints beyond terminality and idempotence. To see this, consider the following example. Consider a system in classical probability theory of type C. Define systems in a 'post-classical theory' by tensoring two systems of type C together to form a composite system of type C := C ⊗ C, with the decoherence map given by tracing out one of the systems and preparing a mixed classical state q = i p i |i i|, such that p i > 0 for at least two distinct values of i, in its place. That is, here, D C := 1 C ⊗ (q • Tr C (_)), or, diagrammatically, It is easy to see that this decoherence map is trace preserving (i.e. terminal), idempotent and recovers all states of the original C system-albeit tensored with a fixed mixed state. However, this does not properly capture the standard notion of decoherence as the 'post-classical theory' is nothing but classical theory itself. Moreover, we can do a similar thing for quantum theory by having a quantum system of type Q that 'hyperdecoheres' from the quantum composite system of type Q ⊗ Q, such that the 'post-quantum theory' is nothing but quantum theory itself.
Note that these examples are ruled out by our assumption that pure states in the decohered sub-theory are pure in the full theory. Indeed, applying D to the pure classical state a ⊗ b results in a ⊗ q = i p i a ⊗ |i i| , but a ⊗ |i i| is not a state in the decohered sub-theory for any i. Hence in the sub-theory a ⊗ q is pure, but in the full theory it is not.
One might ask whether requiring that pure decohered states are pure in the full theory is the minimal assumption needed to rule out these examples. Indeed, demanding the seemingly weaker constraint that the pre-and post-decohered systems have the same dimension also rules them out. Phrased in operational terms, preserving the dimension corresponds to the hyperdecoherence map preserving the number of perfectly distinguishable states. This requirement rules out the above example. Indeed, if the decohered system has n distinguishable states then the original system has n 2 . However, we prove in appendix B that-given a strengthened version of purification-one can derive the requirement that pure quantum states are pure post-quantum states from the assumption that hyperdecoherence preserves the number of perfectly distinguishable states. This, in conjunction with the fact that pure classical states are always pure quantum states, leads us to propose the requirement that pure quantum states are pure as a defining feature of hyperdecoherence. There is, however, a suggestion arising from insights into quantum gravity [56]-that on a fundamental level pure quantum states may not actually be pure. We therefore see the need for this assumption as a feature rather than a bug as it lends further evidence to this assertion. See §6 for a further rumination on this point.
A final requirement of hyperdecoherence is that the original theory is not the same theory as the decohered theory, that is, one of the hyperdecoherence maps must be non-trivial. We say a hyperdecoherence map is trivial if it is equal to the identity transformation, To summarize all of the above, we now formally define a post-quantum theory.

Definition 4.1 (Post-quantum theory).
An operational theory (definition 3.1) is a postquantum theory if, for each system of type A, there exists a hyperdecoherence map satisfying the following: (1) is terminal: (2) is idempotent: (3) (a) Pure states in the sub-theory are pure states. 5 (b) The maximally mixed state in the sub-theory is maximally mixed in the full theory. 6 Moreover, the collection defines a sub-theory which corresponds to quantum theory, and at least one of the hyperdecoherence maps must be non-trivial.

Main theorem. There is no post-quantum theory (definition 4.1) satisfying both causality (definition 3.2) and purification (definition 3.3).
Before we present the proof, we give an intuitive sketch of how it will proceed. We prove that, in any post-quantum theory satisfying causality and purification, the hyperdecoherence map must be trivial for all systems. The main idea of the proof is to show that, by performing a suitable post-quantum measurement on the quantum Bell state and post-selecting on a suitable postquantum effect, any post-quantum state can be steered to. As quantum states are left invariant by the hyperdecoherence map (even locally, as we show below), all post-quantum states are left invariant as well-due to the fact that they can be steered to using a quantum state. Hence, for each system, the hyperdecoherence map must be the identity, a contradiction.
We will now present a purely diagrammatic proof of the main theorem. However, for readers unfamiliar with diagrammatic notation, we will also provide a proof using standard algebraic notation in appendix A.
Proof. For convenience, we denote quantum states with a subscript q. As discussed at the end of §3, given a bipartite quantum state ψ q , it can be written as The fact that this holds even when representing quantum theory as a sub-theory of the postquantum theory follows immediately from (i) the definition of a sub-theory and (ii) linearity of transformations. Idempotence of the hyperdecoherence map (point (2) of definition 4.1) then gives (5.1) Next, consider the maximally mixed quantum state, μ q := 1/d, of a d-dimensional system, and note that, from point (3b) of definition 4.1, this is also maximally mixed for the post-quantum theory; hence for any pure state φ we can write (5.2) that is, any pure state from the post-quantum theory arises in a decomposition of the quantum maximally mixed state.
Recall that every (non-trivial) quantum system of type A has at least two perfectly distinguishable states: {0 q := |0 0|, 1 q := |1 1|}. Given the decomposition of equation (5.2), convexity (equation (3.6)) implies the following is a state in the post-quantum theory: Consider a purification of this state, denoted S φ , and note that it has the following properties: 1.

3.
where the effect 0 q is the quantum effect Tr(|0 0|_), which gives probability 1 for state 0 q and probability 0 for 1 q .
We will denote the Bell state (1/d) ij |ii jj| for a d-dimensional system diagrammatically as Recall that this has the maximally mixed state as its marginals, that is, Then, as the parallel composition, i.e. tensor product, of two pure quantum states is a pure quantum state, and the definition of hyperdecoherence ensures pure quantum states are pure post-quantum states (point (3) of definition 4.1), the following is another purification of μ q with the same purifying system of type AP as S φ : where χ q is a pure quantum state. The purification principle implies that these two purifications are connected by a reversible transformation R φ , Using point (3) 6)), implies that, for all system types A, As we know that there exists a post-classical theory which satisfies causality and purification and decoheres to classical theory, i.e. quantum theory, one might wonder at what stage our proof breaks down when analysing this situation. The main reason is that the maximally correlated state in classical probability theory is mixed and so the classical analogue to the state (A 4) is not a purification and equation (A 5) is no longer valid. Hence, the reason why quantum theory cannot be extended in the manner proposed here is the existence of pure entangled states.

Discussion
From the famous theorems of Bell [57] and Kochen & Specker [58] to more recent results by Colbeck & Renner [59] and Pusey et al. [60], no-go theorems have a long history in the foundations of quantum theory. Most previous no-go theorems have been concerned with ruling out certain classes of hidden variable models from some set of natural assumptions. Hidden variables-or their contemporary incarnation as ontological models [61]-aim to provide quantum theory with an underlying classical description, where non-classical quantum features arise due to the fact that this description is 'hidden' from us.
Unlike these approaches, our result rules out certain classes of operationally defined physical theories which can supersede quantum theory, yet reduce to it via a suitable process. To the best of our knowledge, our no-go theorem is the first of its kind. This may seem surprising given that it is an obvious question to ask. However, to even begin posing such questions in a rigorous manner requires a consistent way to define operational theories beyond quantum and classical theory. The mathematical underpinnings of such a framework have only recently been developed and investigated in the field of quantum foundations.
As with all no-go theorems, our result is only as strong as the assumptions which underlie it. We now critically examine each of our assumptions, outlining for each one the sense in which it can be considered 'natural', yet also suggesting ways in which a hypothetical post-quantum theory could violate it and hence escape the conclusion of our theorem.
Our first assumption is purification. As noted in §3, the purification principle provides a way of formalizing the natural idea that information can only be discarded [32], and any lack of information about the state of a given system arises in an essentially unique way due to a lack of information about some larger environment system. However, proposals for constructing theories in which information can be fundamentally destroyed have been suggested and investigated [62][63][64]. Such proposals take their inspiration from the black hole information loss paradox. Our result can, therefore, be thought of as providing another manner in which the fundamental status of information conservation can be challenged.
Our second assumption is causality. This principle allows one to uniquely define a notion of 'past' and 'future' for a given process in a diagram, and is equivalent to the statement that future measurement choices do not affect current experimental outcomes. As such, this principle appears to be fundamental to the scientific method. Despite this, recent work has shown how one can relax this principle to arrive at a notion of 'indefinite' causality [65][66][67][68]. In this case, there may be no matter of fact about whether a given process causally precedes another. The indefinite causal order between two processes has even been shown to be a resource which can be exploited to outperform theories satisfying the causality principle in certain information-theoretic tasks [69,70]. Moreover, it has been suggested that any theory of quantum gravity must exhibit indefinite causal order [71,72]. Hence, as in the previous paragraph, our result provides further motivation for discarding the notion of definite causal order in the search for theories superseding quantum theory.
As purification seems to require a unique way to marginalize multipartite states, one might wonder whether one can define a notion of purification without the causality principle. Indeed, recent work [67] has shown how one can formalize a purification principle in the absence of causality. Araújo et al. [73] show how an alternative notion of purification can be defined for process matrices allowing for indefinite causal order, and work of one of the authors discusses a 'time-symmetric' notion of purification satisfied by quantum, classical and hybrid quantumclassical systems [74].
Another assumption in our theorem was the manner in which our hyperdecoherence mapthe mechanism by which the post-quantum theory reduces to quantum theory-was formalized. It may not be the case that post-quantum physics gives rise to quantum physics via such a mechanism. Indeed, alternative proposals for how some hypothetical post-quantum theory reduces to quantum theory have been made [75]. Moreover, there is some evidence from research in quantum gravity that quantum pure states may become mixed at short length scales [56]. This suggests that quantum pure states may not be fundamentally pure in a full theory of quantum gravity. However, we see the necessity of the requirement that quantum pure states are pure in a potential post-quantum theory (point (3) from definition 4.1) in our derivation as a feature rather than a bug. Indeed, it lends evidence to the assertion that to supersede quantum theory one must give up the requirement that states which appear pure within quantum theory are fundamentally pure. Despite this, our understanding of the quantum to classical transition in terms of decoherence suggests hyperdecoherence as the natural mechanism by which this should occur. Moreover, as discussed in §4 and shown in appendix B, one can derive that pure quantum states are pure post-quantum states from more primitive notions.
The last assumption underlying our no-go theorem is the generalized framework itself, introduced in §3. While the operational methodology and assumptions underlying this framework seem to be relatively mild, it may not be the case that the correct way to formalize this methodology is by asserting that pieces of laboratory equipment can be composed together to result in experiments, as described in §3. Indeed, it may be the case that the standard manner in which elements of a theory are composed together-resulting in other elements-needs to be revised in order to go beyond the quantum formalism. Work in this direction has already begun [76]. Alternatively, one could take a more radical position and avoid this no-go result by accepting that a more fundamental theory of nature will not have an operational description at all, and that this framework and the operational methodology should be abandoned in their totality.
Our result can be viewed either as demonstrating that the fundamental theory of Nature is quantum mechanical or as showing in a rigorous manner that any post-quantum theory must radically depart from a quantum description of the world by abandoning the principle of causality, the principle of purification or both.
Data accessibility. This paper has no additional data. Authors' contributions. Both authors contributed equally to the current work.
Proof. For convenience, we denote quantum states with a superscript q. As discussed at the end of §3, given a bipartite quantum state ψ q , it can always be written as The fact that this holds even when representing quantum theory as a sub-theory of the postquantum theory follows immediately from (i) the definition of a sub-theory and (ii) linearity of transformations. Idempotence of the hyperdecoherence map (item (2) from definition (4.1)) then gives Next, consider the maximally mixed quantum state, of a d-dimensional system. By (3b) of definition 4.1 this is maximally mixed for the post-quantum theory; hence for any pure state ψ, there is a state σ such that that is, any pure state from the post-quantum theory arises in a decomposition of the quantum maximally mixed state.
Recall that every (non-trivial) quantum system of type A has at least two perfectly distinguishable states, denoted here as {0 q := |0 0|, 1 q := |1 1|}. Given the decomposition of equation (A 2), convexity (equation (3.6)) implies the following is a state in the post-quantum theory: Consider a purification of this state, denoted S φ A 1 A 2 P , and note that it has the following properties: where the effect e q 0 is the quantum effect Tr(|0 0|_), which gives probability 1 for state 0 q and probability 0 for 1 q . Now, let us denote the Bell state (1/d) ij |ii jj| for a d-dimensional system of type A as: B q A 1 A 2 , where A 1 and A 2 are the same type of system, but with a dummy index to allow us to keep track of their ordering algebraic notation. As the hyperdecoherence map is terminal (point (1) from definition 4.1), marginalization in the post-quantum theory is the same as in quantum theory. Hence, as shown in equation (3.4) from §3, both of the marginals of the above Bell state are equal to the maximally mixed quantum state, As mentioned in §3, the only relevant data regarding a process are the types and orderings of the inputs and outputs, where the ordering is kept track of with an additional dummy index on the types when ambiguity could arise. In this case, however, each of the resulting states has only a single output. Hence there is no ambiguity and we can drop this additional dummy index and write As the parallel composition, i.e. tensor product, of two pure quantum states is a pure quantum state, and the definition of hyperdecoherence ensures pure quantum states are pure post-quantum states (point (3) of definition 4.1), the following is another purification of μ q with the same purifying system of type AP as S φ : where χ q is a pure quantum state. The purification principle implies that these two purifications are connected by a reversible transformation R φ A 2 P , Using point (3) above, it then follows that there is an effect e A φ , defined as which steers the Bell state to φ A , that is, Hence for every pure state φ in the theory, there exists an effect, denoted e φ , that steers to it, where d A 1 is the dimension of the system of type A 1 . Using this steering result (equation (A 5)) as well as equation (A 1), we can immediately see that the hyperdecoherence map must act as the identity on all states, where the overline numbers refer to the equations used to obtain each equality. This can easily be extended to the case when D is acting on an arbitrary (and possibly composite) system of a composite state, by using the steering result for a Bell state of a composite system. This result, in conjunction with tomography (equation (3.5)) and convexity (equation (3.6)), implies that, for all systems of type A, Appendix B. Proof that pure quantum states are pure In §4, we discussed how one of the key features of quantum to classical decoherence was that pure classical states are also pure when considered within quantum theory. We took a generalization of this as a defining feature of hyperdecoherence to prove our main theorem. In particular, we noted how this seemed essential to ruling out particular cases, such as a bit 'decohering' from a pair of bits, which satisfy terminality and idempotence, but fail to adequately capture the spirit of decoherence. However, as noted previously, these examples are also ruled out by a seemingly weaker condition: (hyper)decoherences preserve the information dimension of the system. Before we present the definition of the information dimension, recall that two states ρ 1 and ρ 2 are perfectly distinguishable if there exists a measurement {e 1 , e 2 } such that e i [ρ j ] = δ ij .

Definition B.1 (Information dimension [77]
). The information dimension of a system is the number of states in a maximal set that are all pairwise perfectly distinguishable.
Note that, for a quantum or classical d-level system, the information dimension is d [77]; hence, standard quantum to classical decoherence preserves the information dimension. However, in the example presented in §4, this is not the case. For instance, if two bits 'decohere' to a bit, then the information dimension goes from 4 to 2. Hence, in place of point (3) from definition 4.1, we could have stipulated that the hyperdecoherence map preserves the information dimension.
We will now show that, from (i) preservation of the information dimension and (ii) a common strengthening of the notion of purification [23], one can derive the previously postulated requirement that pure quantum states be pure in the post-quantum theory.

Definition B.2 (Strong purification).
1. Every mixed state of a system of type A can be purified to a state of a system of type AA satisfying definition (3.3). 2. If a state ρ of a system of type A is pure, then it has trivial purifications on all systems.
That is, it has a purification ψ on a system of type AB which factorizes as ψ = ρ ⊗ χ , where χ is a state of B, for all system types B.
We now provide an outline of the proof before providing the formal argument below.
Outline. Recall that every quantum pure state is an element of a maximal set of pairwise perfectly distinguishable quantum states. Assume towards contradiction that at least one quantum state is mixed in the post-quantum theory, and decompose it as a convex combination of post-quantum states. Every post-quantum state in this decomposition is perfectly distinguishable from any state the original quantum state is distinguishable from. Using strong purification, we show that there must be a pair of perfectly distinguishable post-quantum states in this decomposition. Hence, we have at least an information dimension of d Q + 1, where d Q is the quantum information dimension. Therefore, as we are assuming that the information dimension is preserved, we must reject the assumption that any pure quantum state is mixed in the full theory.
Proof. Consider a set of pure and perfectly distinguishable quantum states {i q := |i i|} where d Q is the information dimension of the quantum system. Recall that every quantum pure state is an element of a maximal set of pairwise perfectly distinguishable quantum states. Assume towards contradiction that at least one of the above set of quantum pure states is mixed in the full theory, without loss of generality we take this to be 0 q . We can, therefore, write it as where 0 < p < 1. We pick a decomposition such that p is maximized over all possible pure states s. Note that compactness of the set of states (which follows from purification [23,29]) ensures that such a maximum exists. In particular, the maximality of p on s means that Note that, due to the purity of 0 q in quantum theory, s and σ must both hyperdecohere to 0 q . That is, as is a pure quantum state and D[s] and D[σ ] are quantum states, we must have to ensure quantum purity of 0 q .
To proceed, we need the following lemmas, which shall be proved later in this appendix. Before we state our first lemma, recall that the set of states appearing in the convex decomposition of a mixed state are said to refine it. We have the following consequence of the conjunction of transitivity and the above lemma; see section D of [34] for the proof.
Corollary B.6. For any pure state a, the maximally mixed post-quantum state can be decomposed as where p * is the maximal possible probability for any pure state in a decomposition of μ PQ . Now, from point (3b) of definition 4.1 and the form of the quantum maximally mixed state, we can write which, given our decomposition of 0 q , can be written Take a purification of ν, Corollary 12 of [23] states that if the marginal of a bipartite pure state on a given subsystem can be decomposed as a convex combination of perfectly distinguishable states, then the marginal on the opposite subsystem can also be decomposed as a convex combination of the same number of perfectly distinguishable states. As ν is a convex combination of pure and perfectly distinguishable states, we thus have (B 5) where i = 0, . . . , d Q − 1 and the ρ i are perfectly distinguishable. This marginal state must again be a maximally mixed state, i.e. every other state refines it, otherwise there would be a state that could be perfectly distinguished from it, violating preservation of the information dimension. It was proved in corollary 8 of [23] that, in any theory satisfying purification and in which the parallel composition of pure states is pure, every bipartite pure state is steering for its marginals. That is, any state which refines the marginal of a pure bipartite state can be steered to by applying an effect to the opposite system. Moreover, corollary 12 of [23] ensures that the set of effects which, when applied to ψ, steer to s and {i q } i =0 correspond to the effects which perfectly distinguish among the states {ρ i }. In particular, the effect e ρ 0 which picks out ρ 0 is the effect that steers to s. Additionally, the effects which distinguish the states s and {i q } i =0 are the ones which steer to the states {ρ i } when applied to ψ.
To complete the proof, we need one more ingredient: the states-transformations isomorphism, or generalized Choi theorem [23, theorem 17]. This theorem implies that if the marginals of a bipartite pure state are both maximally mixed, then any effect which steers to a pure state on either system must be pure. The generalized Choi theorem holds in any theory satisfying purification, in which the parallel composition of pure states is pure, and in which the product of maximally mixed states is maximally mixed (which holds for us due to point (3b) of definition 4.1). In particular, this implies that the effect e ρ 0 , which steers to s, must be pure.
Combining this with equation (B 4), and the discussion after equation (B 5), we have that Hence, from equation (B 4) it follows that e ρ 0 [β] = 1, where e ρ 0 is the pure effect that steers to s and β is a pure state. Transitivity then implies that for any pure state there is some pure effect which picks it out with probability 1.
Proof of lemma B.5. This result was proved in proposition 11 of [31] and lemma 30 of [29]. The conjunction of lemma B.4, the generalized Choi theorem (see the proof of lemma B.4 for a brief discussion of this point) and steering (again see the proof of lemma B.4 for a brief discussion) is all that is needed for these proofs to go through; hence, lemma B.5 follows.