An algorithm to explore entanglement in small systems

A quantum state’s entanglement across a bipartite cut can be quantified with entanglement entropy or, more generally, Schmidt norms. Using only Schmidt decompositions, we present a simple iterative algorithm to maximize Schmidt norms. Depending on the choice of norm, the optimizing states maximize or minimize entanglement, possibly across several bipartite cuts at the same time and possibly only among states in a specified subspace. Recognizing that convergence but not success is certain, we use the algorithm to explore topics ranging from fermionic reduced density matrices and varieties of pure quantum states to absolutely maximally entangled states and minimal output entropy of channels.


Introduction
As a consequence of the singular-value decomposition, a normalized state |ψ in a bipartite Hilbert space H A ⊗ H B with finite dimensions d A , d B and n s := min(d A , d B ) can be written as with Schmidt coefficients λ ψ 1 ≥ · · · ≥ λ ψ n s ≥ 0 satisfying i (λ ψ i ) 2 = 1 and Schmidt vectors |ψ i A ∈ H A , |ψ i B ∈ H B . The sequence of Schmidt coefficients and its related entanglement spectrum ξ i = − log(λ 2 i ) characterize entanglement [1] and aid the study of condensed matter systems [2]. Often, such systems satisfy symmetry constraints that are imprinted on the Schmidt coefficients-think of the Haldane phase of the S = 1 spin chain [3] and symmetries for bosons and fermions.  The goal of this paper is to point out that the structure of the Schmidt decomposition allows for an easy iteration with which we can explore some of the constraints imposed by symmetries. Concretely, the procedure attempts to maximize norms (see [4] for p = 2), which we refer to as Schmidt norms, (1.2) over pure states |ψ in a specified subspace U ⊂ H A ⊗ H B . Though they may seem rather abstract, these norms are relevant to a wide range of problems. We believe this can most effectively be demonstrated with examples and shall do so in §5.
The first sections of this paper are fully general: §2 states the algorithm and its goal in the most general terms; §3 motivates the relevance of the Schmidt norms by linking them to entanglement; §4 contains the basic properties of the algorithm. We then move on to specific examples in §5. These demonstrate the scope of the algorithm and motivate why it is often interesting to maximize the norms (1.2): §5a looks at fermionic antisymmetry; later sections discuss more mathematical applications such as absolutely maximally entangled (AME) states and perfect tensors ( §5b), varieties of pure quantum states ( §5c) and minimal output entropy of channels ( §5d). We hope that the reader will select one or more of these topics to see what the algorithm can do in a concrete setting-we refer the reader to the conclusion ( §6) for a summary.

Set-up and algorithm
This section states the algorithm and its goal. Note that this is not a convex optimization problem because we are maximizing, rather than minimizing, a (convex) norm. We refer to these norms as Schmidt norms; see §3a for basic properties. The vector norm . equals . 2,n s .

(b) The algorithm
Let P be the orthogonal projection onto the subspace U, i.e. P 2 = P, P † = P and Im(P) = U.
In words, the idea is to use repeated Schmidt decompositions while prioritizing the Schmidt coefficients according to the power p and forcing the state to be in the desired subspace.
In §4a, we prove that |ψ p,k increases with each step and that it converges to a fixed point that may or may not be the global maximum. Section 4b describes a generalization to several bipartite cuts. The other subsections of §4 discuss basic properties and can mostly be read independently. It is also possible to skip ahead to the applications in §5.

Schmidt norms and entanglement
This section provides some background on the Schmidt norms and explains their relation with entanglement.

(a) Schmidt norms
The norms (1.2) for p = 2 were first studied in quantum information theory in [4]. More generally, a Schmidt norm (1.2) is an operator norm if |ψ in (1.1) is regarded as a map It is the Schatten p-norm if k = n s and the Ky Fan k-norm if p = 1 [5]. Of course, we can use well-known properties and results for these norms, such as the one below. Proof. It will be convenient to prove the following variational characterization of the Schmidt coefficients. For a decreasing sequence c 1 ≥ · · · ≥ c k ≥ 0, we claim that Inserting the Schmidt vectors of |ψ , the l.h.s. is clearly lesser than or equal to the r.h.s. To prove the opposite inequality, let |u 1 , . . . , |u k ∈ H A orthonormal and |v 1 , by the Schmidt decomposition (1.1) and the Cauchy-Schwarz inequality. This is a product of two similar quantities, one of which is where P span{|u 1 ,...,|u i } is the orthogonal projection onto the span of |u 1 , . . . , |u i and we have defined c k+1 := 0. As c i − c i+1 ≥ 0, we see that the maximum is attained for |u i = |ψ i A , and (3.2) implies To prove the statement, the only non-trivial thing to check is the triangle inequality. Ignoring normalization, suppose (1.1) is the Schmidt decomposition of |ψ := |ψ 1 + |ψ 2 . We then have By the variational expression (3.1) with c i = (λ ψ i ) p−1 and Hölder's inequality 1 , we find which implies the triangle inequality.

(b) Entanglement entropies
The reduction of |ψ in (1.1) to system A is and similar for the reduced density matrix on B. This expression is one reason that the Schmidt norms (1.2) are useful. For example, (3.8) so that the α-Rényi entropy [6], defined for a density matrix ρ (for α ≥ 0 and α = 1) as The von Neumann entropy is and S(ρ ψ A ) is the entanglement entropy of |ψ across the bipartition. Note that we can approximately minimize this entropy by applying the algorithm to α ↓ 1, whereas α ↑ 1 gives an approximate maximization. Both optimizers are pure states by convexity of the underlying norm.

Basic results
This section discusses basic properties of the algorithm, such as convergence and generalizations.
(a) Convergence and fixed point equation where the orthogonal projection P was placed in with P † = P and P |ψ = |ψ . Consequently, by Cauchy-Schwarz, At the same time, by (2.2) and (3.1), and finally, by Hölder's inequality, Combining (4.2)-(4.4), we see that |ψ p,k increases under the iteration. It is also bounded because λ ψ i ≤ 1 and hence |ψ p p,k ≤ |ψ 2 2,n s = 1 in all dimensions for p ≥ 2, or alternatively |ψ p p,k ≤ n s in finite dimensions for p < 2. A bounded, increasing sequence converges.
Note that this theorem does not say anything about convergence of the vector. The best we can do in this respect is to use (2.2), (4.1), (4.3) and (4.4) and note and hence Even though the r.h.s. converges upon iteration, this is not quite enough for convergence of the vector because we could in theory move between vectors that all have a large norm |ψ p,k . Although a better understanding of the iteration would be desirable, in practical applications we can terminate the iteration whenever the increments in . p,k drop below a set tolerance and the resulting vector can then be used as an output vector with a large norm.
Note that (4.6) implies that a global maximizing vector must be a fixed point of the iteration because the quantity |ψ p,k cannot increase further in that case. Any other fixed points satisfy the following equation.
and the global maximum is such a fixed point. For general P, it is not at all clear how many states satisfy the fixed point equation (4.7) and how often the process converges to fixed points that are not the global maximum. Luckily, there are many things we can do without running into this problem; see §5.

(b) Generalizations: coefficients and several cuts
The iteration (2.2) owes its efficacy to the structure of the Schmidt decomposition, and it is natural to ask whether the idea can be generalized. We needed p ≥ 1 for Hölder's inequality in the last line of the proof of theorem 4.1-indeed the result does not hold for p < 1. Nevertheless, there are two generalizations that do work.
First, we could simply include some coefficients in the norm (2.1). It is essential that they are positive and ordered in size.

Corollary 4.3.
For an integer k, real numbers c 1 ≥ · · · ≥ c k ≥ 0 and p ≥ 1, the optimization problem allows for a version of theorem 4.1 with an iteration (2.2) that is adapted identically.
Second, we can try to simultaneously maximize across different cuts. This requires some notational caution: so far we have assumed that we were only studying one cut that was implicit in the definition of the Schmidt norm, so we will avoid talking about norms for now. Assume We can maximize sums involving both at the same time: the following statement demonstrates this, and is easy to generalize.
As shown in §5, this can work very well, but there is a practical objection to considering too many cuts. We mentioned the risk of converging to a fixed point that is not the global maximum when we derived the fixed point equation in corollary 4.2. The adapted algorithm has a similar equation, but this time it contains many more 'variables' and it is therefore likely to have more solutions. Hence, at least heuristically, the more cuts, the larger is the risk of ending up at unwanted fixed points-something that, for example, becomes evident in the application in §5b. It would be very interesting to find a solution to this problem, for instance, by adding a simulated annealing component to the algorithm.
In some cases, it can be hard to generate P in an efficient way, so that might also be expensive.  Table 1. The following data are included to give an indication of the number of iterations required and the prevalence of the fixed point issue, partly because it is hard to make general statements about this. The first column in the table lists an application discussed in §5 for which the global maximum is either known or conjectured. We run the algorithm ten times to try to find it. The second column lists the average number of iterations needed for the norm to get within 10 −5 of the conjectured or known answer if the procedure does not get stuck at other fixed points. The third column lists the number of times this does not happen and the procedure is successful. One can also ask how many iterations are required to reach convergence, but in practice this depends heavily on the application; see, for example, table 1.

(d) Comparison with an iteration suggested by Shor
The algorithm shares some features with an idea attributed to Shor by Datta & Ruskai [7]. Adapting its purpose from minimizing entanglement entropy to the current goal (2.1) with k = n s and p = 2α ≥ 2, that is Tr 2α,n s with α ≥ 1, it goes as follows.
The norm now increases in a similar way upon iteration (we use that |ψ and |φ are eigenvectors of P and Hölder's inequality).
The algorithm in §2b only works with vectors, whereas this one uses density matrices. That has two consequences: I. its scope is restricted to p = 2α ≥ 2 and traces (k = n s ) and II. its computational cost is higher because it requires multiplication of square matrices of size d A d B , rather than just acting with such a matrix on a vector.

(e) The p = 2 case
The optimization problem (2.1) for p = 2 and general k has been studied before [4,8]. It can be written as an operator norm of the projection P defined in terms of the vector norm . 2,k . This quantity appears on the r.h.s. of the lemma below. Note that the condition |ψ 2,k = 1 simply says that the Schmidt rank-the number of non-zero Schmidt coefficients-is k or less.
Proof. We use estimates from the proof of theorem 4.1. Recall that |ψ ∈ U is arbitrary. We can see that the l.h.s. in (4.13) is lesser than or equal to the r.h.s. by dividing (4.2) with p = 2 by The opposite inequality is found by making the same division in (4.3) and (4.4). An algorithm based on semi-definite programming was suggested to compute lower bounds to this quantity [9]. Another approach [10] is included in the Matlab package QETLAB [11] that we used for tests, but comparisons suggest that it is much slower and more likely to get stuck at unwanted fixed points than the one discussed here. In many respects it also has a narrower scope, with the big exception that it can handle general operators rather than just orthogonal projections.

Applications and examples
In this section, we discuss several ways in which the algorithm can be applied. In each case, we simply ask how the algorithm can help to explore a certain topic. In some cases, we include the success rate and the number of iterations needed to get close to the maximum (table 1) to give the reader an idea of how the algorithm performs in practice.
(a) N-representability and fermionic particle entanglement The Hilbert space of N fermions-identical particles-is the antisymmetric product ∧ N H of N copies of the one-particle space H (see definitions 5.1 and 5.2 below). States in this Hilbert space have Schmidt decompositions (1.1) with H A = ∧ K H and H B = ∧ N−K H and associated reduced density matrices (3.7). The overall antisymmetry places strong constraints on both, and people have been trying to find out what these are since 1959 [12,13]. 2 What motivated them was quantum chemistry: physical interactions like the Coulomb repulsion between electrons usually act pairwise. This means that interaction operators take the form i<j V ij , where each term acts as a fixed two-body interaction V 12 on particles i and j and as the identity on all others. The expectation value of such an operator for an N-fermion state |ψ then becomes where ρ ψ 2 is the reduction (3.7) of |ψ to two particles (note that all reductions are equivalent because of the antisymmetry).
Say that we wanted to minimize a quantity like (5.1) by varying over all states |ψ . Computationally, we would favour the r.h.s. of the equation because it involves a two-particle space only, but there is one obvious problem: we do not know what constraints the antisymmetry imposes on the reduction ρ ψ 2 , or equivalently the Schmidt vectors and coefficients in (1.1). This means that we do not know which ρ ψ 2 to include in the minimization of the r.h.s. of (5.1). Enticed by the computational gains that such knowledge would bring, people actively thought about this problem in the 1960s. It goes by the name of N-representability [15] and is a specific example of the quantum marginal problem [16]. It is mostly strongly associated with Coulson, who raised the challenge at a conference in 1959 [12,13], and Coleman, who studied the problem throughout his career (see, e.g. the papers [15,17], book [14] and summary [18]).
These days, entanglement provides a new motivation to look at it. The entanglement between K fermions and the remaining fermions in the system-as associated to a Schmidt decomposition (1.1) with H A = ∧ K H, H B = ∧ N−K H-is known as particle entanglement [19]. It is sometimes confused with mode entanglement, which describes the entanglement between restrictions of the state to complementary regions of (Hilbert) space, or even dismissed as irrelevant [20,21], but that is certainly not the case in the context of condensed matter systems [19,22] and quantum chemistry [13].
So where do we stand on N-representability today? We know that inferring the full implications of antisymmetry is a QMA-hard problem-not even efficiently solvable by a quantum computer [23]. That should not discourage us from trying to extract basic mathematical information: the full constraints imposed by antisymmetry on the Schmidt decomposition with K = 1 were recently computed in systems with small dimensions and particle numbers [24]. Interestingly, they go beyond the original Pauli principle that says that no two fermions can occupy the same state. Our algorithm cannot shed new light on this progress, and we restrict the discussion to N-representability questions that it can handle.
To summarize the above, it has long been recognized that it is important to infer the consequences of antisymmetry on the Schmidt decomposition (1.1) and the related reduced density matrices (3.7) because that allows for a more efficient computation of (5.1). In this section, we numerically verify a number of conjectures regarding the effect of antisymmetry on the Schmidt coefficients only. We also formulate a new one (conjecture 5.9).
The following basic definitions will be used below.

Definition 5.1 (Hilbert space of N fermions).
We define ∧ N H-the antisymmetric tensor product of N copies of a Hilbert space H-as the subspace of ⊗ N H whose elements |ψ are completely antisymmetric upon exchange of two particles. This means that, for 1 ≤ i 1 < i 2 ≤ N and U (i 1 i 2 ) the unitary implementing the permutation (i 1 i 2 ) on ⊗ N H, The orthogonal projection onto this space is where S N is the group of permutations of N elements.
Definition 5.2 (Antisymmetric tensor product). Let K 1 , K 2 ∈ N, |Ψ ∈ ∧ K 1 H and |Φ ∈ ∧ K 2 H. Let P be the orthogonal projection onto the antisymmetric subspace ∧ K 1 +K 2 H. We define the antisymmetric tensor product It is easy to check that this tensor product is associative (α ∧ β) ∧ γ = α ∧ (β ∧ γ ) and hence we will simply denote such products as α ∧ β ∧ γ . Note that it is not linear.

Definition 5.3 (Slater determinant
). An antisymmetric product of N orthonormal vectors |φ 1 , . . . , |φ N ∈ H, is known as a Slater determinant. For H = C d with orthonormal basis |φ 1 , . . . , |φ d , the d N Nfermion Slater determinants that we can build from this basis are an orthonormal basis of ∧ N C d .
We first discuss some known facts. The following result is essentially the Pauli principle.
The two-particle case is much harder, and the only clear-cut analytical result was obtained by Yang.
Any optimizer |Ψ (N) can be built from an orthonormal basis |ψ i by defining |Ψ (2) where P is the orthogonal projection onto ∧ N C d . Such states are known as Yang states.
No other results like this are known, but these are exactly the kind of questions that we can investigate with the algorithm. General statements of this form lead to interesting mathematical inequalities (see [26,27]), interesting states (the Yang states are examples of the BCS states that appear in superconducting systems [25]), and they allow us to investigate the particle entanglement properties of fermions. These are some of the reasons that there appears to be a renewed interest in these questions: for example, the theorem above was recently rediscovered in the context of hard-core bosons [28]; a weaker version of the well-known bosonic analogue [29,30] of lemma 5.7 was presented as a new result in [31]; part of conjecture 5.10 was stated as a fact in [19]; known N-representability constraints were put to use in quantum algorithms in [32].
The conjectures below are all confirmed by the algorithm for up to 10 fermions. 3 We hope that their discussion will serve as a brief review of some interesting open questions and some useful past results.
The first conjecture generalizes theorem 5.5 to higher-order density matrices.
Conjecture 5.6 (Yang [25], see [26] for exact constants). Let N, d and K ≤ N be even. For H A = ∧ K C d and H B = ∧ N−K C d , the Yang state |Ψ (N) is a maximizer of . 2 2,1 among the normalized vectors of ∧ N C d . For d → ∞, the limiting value is N/2 K/2 / N K .
Yang himself [33] proved a version of this conjecture that follows easily from a very useful antisymmetrization formula obtained by Sasaki [15,34], but, as explained in [26], his result amounts to a watered-down version of the full conjecture. Finding a complete proof is profoundly more difficult than it is in the K = 2 case, for which the following lemma is essential. Remark 5.8. Similarly, the analogue for bosonic states [30] in ⊗ 2 s H (effectively Takagi factorization [29]) says that the Schmidt decomposition can always be written as a sum of terms λ i |ψ i ⊗ |ψ i .
For N even, note that ∧ N H ⊂ ⊗ 2 s (∧ N/2 H) if N/2 is even and ∧ N H ⊂ ∧ 2 (∧ N/2 H) if it is odd. Therefore, the Schmidt decomposition of a state in ∧ N H with H A = H B = ∧ N/2 H has a structure similar to one of those mentioned above, depending on whether N/2 is even or odd. This is one of the few cases in which the consequences of antisymmetry are partly evident.
Returning to the topic of theorem 5.5, lemma 5.7 provides us with a pairing structure for one of the Schmidt vectors, which can then be used to find the optimizer (5.9). For general even K such pairing is generically absent, and proving that the maximizer possesses it is truly the biggest challenge in overcoming Conjecture 5.6: if we assume pairing, arguments like [36] get close to the full answer. Note that there is a strong intuition for pairing: it turns the fermions into hardcore bosons (in a particular basis), and these can be condensed in the same state as long as we spread out their wave functions (see [28]). The Yang state also has an interesting symmetry (see theorem 5.11 below), and the fixed point equation (4.7) takes on a particularly simple formperhaps these can shed new light on this problem.
Moving on to two Schmidt coefficients, the following conjecture is easily obtained with the algorithm.

Conjecture 5.9. Let N and d be even. For H
Consider a basis consisting of |φ 1 , |φ 2 and |ψ i with 1 ≤ i ≤ d − 2, and a Yang state |Ψ (N−2) (5.9) built from the |ψ i . An optimizer is then given by where P is the orthogonal projection onto ∧ N C d .
Surprisingly, the limiting value for d → ∞ is (N − 1) −1 : exactly the same as the maximum of . 2 2,1 in that limit. This shows that eigenvalues of fermionic reduced density matrices are coupled very strongly. Coleman [17] had already predicted a (d-independent) deviation of O(N −2 ) from O((N − 1) −1 ), and studied a general class of states containing (5.5), (5.9) and (5.12) called (generalized) antisymmetric geminal powers (AGP, [14,17]) that demonstrate similar coupling between eigenvalues. These states are seemingly very important for these kinds of problems, but the exact role they play in the fermionic subspace is still unknown.
To study entanglement, we move on to problems involving all Schmidt coefficients.
That is, both entropies are minimized by reduced density matrices of Slater determinants (5.5).
This says that Slaters have the lowest particle entanglement among all the fermionic states. That may seem like a fairly safe claim, but the mathematics behind it is not at all trivial. For instance, the 2-Rényi entropy conjecture is equivalent to (5.14) For K small compared to N, this is O(N −K ), so that we see that a fermionic K-particle reduced density matrix can have at most O(N a ) eigenvalues of O(N −(K+a)/2 ), which gives a highly nontrivial bound for 0 ≤ a ≤ K. This fits with theorem 5.5 and conjecture 5.6 for a = 0, and it provides another example of the constraints that antisymmetry puts on the Schmidt coefficients. Our current understanding of these effects is rather limited, but they are fundamental to the structure of fermionic states, and hence to quantum chemistry and condensed matter. We hope that this section helps to renew interest in a challenging topic that has remained challenging despite the attention it rightfully received in the 1960s.

(b) Absolutely maximally entangled states
An N-partite state |ψ ∈ ⊗ N C d is called AME or a perfect tensor if all possible reductions to N/2 parties are maximally mixed [38,39]. These states have attracted some attention in the context of quantum error-correcting codes, but we ignore questions about their relevance and focus on their existence for given N and d instead. It turns out this greatly depends on these parameters: for d = 2, AME states only exist for N = 2, 3, 5, 6; for d = 3, they exist for N = 2, 3, 4, 5, 6, 7, 9, 10 and possibly 11 and 15 [40].
How can the algorithm help to decide on existence? The reduction (3.7) to N/2 parties is related to the Schmidt decomposition (1.1) with H A = ⊗ N/2 C d and H B = ⊗ N/2 C d . If |ψ is maximally mixed, all its n s = d N/2 Schmidt coefficients (λ ψ i ) 2 equal 1/n s . It is easy to see that the Cauchy-Schwarz bound is satisfied, and only satisfied in that case. In other words, to obtain an AME state, we need to maximize |ψ 1,n s across N N/2 cuts at the same time, and we can use the general form of corollary 4.4 with P = 1, p = 1 and k = l = n s . This is particularly interesting in the cases d = 6, N = 4 and d = 4, N = 7 where existence of AME states is undecided [40], but unfortunately these are out of reach-seemingly not because of computational power, but because of the dominance of other fixed points (see e.g. the behaviour for d = 5, N = 4 in table 1). In any case, the algorithm quickly finds the known AME states for d = 2; the ones with N ≤ 7 for d = 3; with N ≤ 4 for d = 4, which is a good example of its ability both to maximize entanglement and to treat several cuts at the same time. The method can also be applied to similar concepts, such as locally maximally entangled states [41], and mixed-dimensional AME states [42]. 4

(c) Dimensions of varieties of pure states
How large can a subspace U ⊂ H A ⊗ H B be if we do not want it to contain states that have ≤ r nonzero Schmidt coefficients, or in other words, Schmidt rank ≤ r? The answer is (d A − r)(d B − r), as shown by Cubitt, Montanaro and Winter [43]. It is possible to heuristically reach this prediction: the (affine algebraic) variety of states with Schmidt rank ≤ r has dimension r(d A + d B − r) (see point 1. below) and hence intuitively the largest dimension of a subspace that does not intersect the variety in any point other than the origin-the answer to the question-is indeed r). Such intuition can be made rigorous with standard results in algebraic geometry as both the subspace and this variety are essentially projective varieties; see [43][44][45][46].
It may seem like an unusual application, but we could have answered this question numerically. The states with Schmidt rank ≤ r are maximizers for the problem (2.1) with k = r and p = 2. Therefore, if these states are present in a given subspace U ⊂  Table 2. Consider the space in the first column. The third column shows the complex dimension of the variety of states specified in the second column, based on parameter counting. The fourth column shows the maximal complex dimension that a subspace can have if it does not contain the states in the second column, as predicted by the algorithm. As expected, this is the dimension of the space minus the dimension in the third column, possibly up to 1/2 complex, or 1 real dimension. .
. should find them. We now generate random U of increasing dimension, and expect that they generically avoid the rank ≤ r states if they can. The largest dimension D for which the random subspaces do not contain any rank ≤ r states is the predicted answer to the question. Could there not be a special subspace with dimension D + 1 that also avoids the states with rank ≤ r? The relation between D and the dimension of the variety makes that seem improbable: if D + 1 can be attained, the variety's dimension is at most d A d B − (D + 1), but then a random (D + 1)-dimensional subspace is unlikely to contain rank ≤ r states-crudely think of a plane and a line that both intersect the origin in R 3 .
Of course, these are mere heuristics, and we tested the idea on different optimizers of problems (2.1) for which the (rigorous) dimension-argument holds, namely Schmidt rank ≤ r states (maximize . 2,r ), maximally entangled states (maximize . 1,n s as shown in the previous section), bosonic condensate states (maximize . 2,1 if we split off one particle), and special fermionic states such as Slater determinants (5.5) (maximize . 2,1 for K = 1) and Yang states (5.9) (maximize . 2,1 for K = 2). Table 2 lists the results obtained with the algorithm and compares them to the dimensions of the varieties as obtained with parameter counting. The former should be the dimension of the space minus the latter (possibly up to 1/2 complex, or one real dimension; see below), which is indeed the case.
We now justify the dimensions listed in the third column of table 2. Keep in mind that the Grassmannian G(r, d) is the space of r-planes in a d-dimensional space [45,46], and that it has dimension r(d − r).
(i) Schmidt rank ≤ r. A rank r matrix can be represented by an isomorphism on an r-dimensional space together with its d A − r dimensional kernel and the (d B − r) dimensional complement of the image. The last two are elements of Grassmannians, and we conclude that the total dimension is r 2 + r(d A − r) + r(d B − r). In the case of (anti)symmetric matrices, the r × r matrix is (anti)symmetric, and the Grassmannian dimension only comes in once: r+1  (iii) Bosonic condensates. A condensate state |ψ ⊗ · · · ⊗ ψ is completely defined by |ψ ∈ C d , so the dimension is d. (iv) Fermionic states. A Slater determinant (5.5) is simply an element of the Grassmannian, but we have to add an extra dimension for phase and norm. A Yang state (5.9) is the fermionic equivalent of a bosonic condensate and it is fully defined by a two-particle maximally entangled fermionic state, so we again find 1 2 d 2 .
Theorem 5.11 (Symmetries of the Yang state). The two-particle Yang state (5.9) satisfies U ⊗ U |Ψ (2) = |Ψ (2) for U ∈ U(d) that, in the basis in which the pairing of |Ψ (2) is defined, Proof. It is well known that a maximally entangled state is invariant under U ⊗Ū |Φ = |Φ , where U acts on the |u 's and |v 's as defined by the index i.
Relabelling |v 2i−1 = |u 2i and |v 2i = − |u 2i−1 to obtain |Ψ (2) , we demand At the same time, only unitaries that act on the two indistinguishable particles in the same way are allowed:Ū This gives the equations. We now determine the real dimension of the group and then divide by two to arrive at the complex dimension 1 2 d+1 2 . Start with any d × d matrix. This has 2d 2 real variables. The matrix is unitary if its rows form an orthonormal basis, and we should count the number of independent equations this gives. We note, however, that the rows of our desired matrices have a special, paired structure by the equations we just derived. For example, row 2i will automatically be normalized if row 2i − 1 is. This leaves just d/2 normalization equations instead of the usual d. Similarly, the equations imply that rows 2i − 1 and 2i are orthogonal, and for i = j, that orthogonality of row 2i − 1 with rows 2j − 1 and 2j implies orthogonality of row 2i with rows 2j − 1 and 2j, so that the number of orthogonality equations is just 2 d/2 2 instead of the usual d 2 . Note that orthogonality gives two independent real equations, but that normalization only counts for one. This gives real dimension 2d 2  Given that the heuristics work well, it could be interesting to try states with more complicated structures, such as the fermionic examples in §5a. After all, the dimension of a variety says something about the structure of its states, and this is a way to numerically determine what it is.
To give one more example, the dimension of the so-called subspace variety is well known and easy to compute [47], but it can also be derived with numerics and Corollary 4.4. Because it immediately leads to a generalization of the main results of [43,44], we add a proof of this well-known fact.
Lemma 5.12. Consider the space C d 1 ⊗ · · · ⊗ C d n . The dimension of the subspace variety consisting of states with Schmidt ranks ≤ r i for decompositions (1.1) Proof. First assume that min(r i , Π j =i r j ) = r i for all k. For each i, a state |ψ can be regarded as a map C d i → ⊗ j =i C d i (known as a flattening [47]) with a kernel Λ i that has dimension at least d i − r i . To calculate the dimension of the variety, we can assume that it is indeed d i − r i because states with bigger kernels form a lower-dimensional variety. Note that the kernels can be controlled independently by unitaries acting on the relevant Hilbert space, so that we can describe |ψ by (|φ , Λ 1 , . . . , Λ n ), with |φ ∈ C r 1 ⊗ · · · ⊗ C r n and C r i isomorphic to C d i /Λ i . The resulting dimension is the sum of the dimensions of the Grassmannians G(r i , d i ) and the dimension of C r 1 ⊗ · · · ⊗ C r n , namely (5.19) (see also the incidence correspondence arguments used in [45]).
In reality, the number of Schmidt vectors involved in each decomposition isr i := min(r i , Π j =i r j ), so that the general result isr 1r2 · · ·r n + n i=1r i (d i −r i ). (5.20) Note that only one of ther i 's can be different from r i . Assumer 1 = r 2 · · · r n . It is then easy to see that (5.20) equals (5.19).
This fact can be used to start solving a problem raised in [43]: what can we say about states with rank restrictions across several cuts? It is of course much more difficult to say something if the kernels above are no longer independent: the simplest case of C d 1 ⊗ C d 2 ⊗ C d 3 ⊗ C d 4 was studied in [48], but a lot is still unknown. 5 The algorithm and the idea discussed here may be able to help.

(d) Minimal output Rényi entropy for channels
Suppose that we want to use copies of a quantum channel to transmit information. It it then useful to know whether entanglement can help, that is, whether the capacity to send classical information increases if we use entangled input states. Famously, an additivity conjecture said that this is not the case, but it was shown to be false in [49]. Without discussing the details, we want to review why our algorithm can be applied to a related question: the maximal p-norm multiplicativity conjecture [50]. Although this conjecture has been disproved for p > 4.79 in [51] and for p > 1 in [52], there is a desire to find explicit counterexamples for 1 < p ≤ 2, and that is where the algorithm might be able to help.
Note that the p in 'p-norm multiplicativity conjecture' will be α in what follows to avoid confusion with the p used in the Schmidt norms (1.2).
As we will be working with density matrices ρ, we define, for α ≥ 1, By their variational characterization, it is easy to see that the norms are convex on the set of density matrices [53], so that we can restrict the maximization to pure states. Also note that the l.h.s. in (5.22) has to be bigger than or equal to the r.h.s. We need channels E, F with a strict inequality for a counterexample. So how can the maximization problems in (5.22) be tackled numerically? Consider E's Kraus operators V i : H S → H A [54], with so that The object is an orthogonal projection onto the image of V in H A ⊗ H B , and this can be used in the algorithm. By convexity, the first problem on the r.h.s. of (5.22) is which we can try to maximize. The other two optimizations can be tackled in the same way. Note that the results directly translate to minimal output Rényi entropies with the definition (3.9). The link between output entropy of channels and subspaces was exploited in [52] to find counterexamples to (5.22), but the construction uses random channels. There is an interest to find more explicit counterexamples. For α > 2, the fermionic subspace ∧ 2 C d with d → ∞ is an easy one [55]; although the algorithm can verify it, it would be much more useful to test examples where a proof is hard to obtain. Note that fixed points prevent the algorithm from serving as a rigorous proof technique, but it can still help to develop an intuition about good counterexample candidates. For instance, an analysis of the entangled subspaces discussed in [56] suggests that these are unlikely to work.

Conclusion
We propose a conceptually simple algorithm to find pure states that maximize Schmidt norms within a specified subspace. It can be generalized to treat several bipartite cuts at the same time. Our approach uses only pure states and not density matrices, which limits computational costs. It also allows us to distinguish between 1 < p < 2, which favours large entanglement for k = n s , and p > 2, which favours small entanglement for k = n s .
We prove convergence of the Schmidt norm under iteration, but like in other approaches we note that there are fixed points other than the global maximum where the algorithm can get stuck. In many cases, this does not hamper investigations (see, e.g. table 1); in others, it would be good to find a way to deal with this problem, for example by adding a simulated annealing component. This is currently the best way in which the algorithm can be improved.
To motivate the reader to try the iteration in their own research, we included several possible applications. Numerical tests were done with the MATLAB package QETLAB [11], and conclusions are listed below.
(i) Physical symmetries imprint themselves on the Schmidt decomposition, and hence on entanglement properties of states. The exact consequences are often not easy to infer, but we illustrated that the algorithm can sometimes help by numerically verifying a number of long-standing conjectures for fermionic states with up to 10 fermions, and also formulating a new one. For larger systems, we are restricted by computational capacity, although efficient programming can probably offer great improvements. This allows us to search small systems for AME states or perfect tensors [38,39], or, for example, for locally maximally entangled states [41]. Unfortunately, unwanted fixed points pose a greater risk for larger numbers of cuts, which means that we cannot currently discover unknown AME states. We do find new 'mixed-dimensional AME states' [42]. (iii) Inspired by results of Parthasarathy [44] and Cubitt et al. [43] (effectively) about dimensions of certain varieties of pure quantum states, we show that these can sometimes be determined by applying the algorithm to random subspaces, and point out that a wellknown fact in algebraic geometry (lemma 5.12) leads to a simple generalization of the main results of [43,44]. A similar approach may help to uncover relations between ranks across all bipartite cuts. (iv) For a given quantum channel, we explain how to numerically determine the minimum output α-Rényi entropy with α > 1. This means that the algorithm can investigate additivity questions involving such entropies, but progress seems to require both a good intuition about which projections to consider and a targeted numerical effort.
We encourage the reader to give the algorithm a try: it is easy to implement, the risk of fixed points is often surprisingly small and it is likely to be faster than approaches that involve density matrices. It would be particularly interesting to see if it can help to explore entanglement properties in spin systems and how it connects with the techniques used in the context of matrix product states.