# Efficient quantum circuits for dense circulant and circulant like operators

## Abstract

Circulant matrices are an important family of operators, which have a wide range of applications in science and engineering-related fields. They are, in general, non-sparse and non-unitary. In this paper, we present efficient quantum circuits to implement circulant operators using fewer resources and with lower complexity than existing methods. Moreover, our quantum circuits can be readily extended to the implementation of Toeplitz, Hankel and block circulant matrices. Efficient quantum algorithms to implement the inverses and products of circulant operators are also provided, and an example application in solving the equation of motion for cyclic systems is discussed.

### 1. Introduction

Quantum computation exploits the intrinsic nature of quantum systems in a way that promises to solve problems otherwise intractable on conventional computers. As the most widely used model of quantum computation, a quantum circuit provides a complete description of a specified quantum algorithm, whose computational complexity is determined by the number of quantum gates required. In general, the number of two-level gates (i.e. unitary matrices acting non-trivially on two-dimensional subspaces, which are universal for computation) needed to decompose an arbitrary unitary in *N* dimensions scales as *O*(*N*^{2}). There are many known *N*-dimensional matrices that cannot be decomposed as a product of fewer than *N*−1 two-level gates [1], and thus cannot be implemented efficiently on a quantum computer. An essential research focus in quantum computation is to explore which kinds of linear operations (either unitary or non-unitary) can be efficiently implemented using $O(\mathrm{poly}(\mathrm{log}N))$ number of elementary quantum gates (i.e. one- or two-level unitary matrices) and measurements.

Significant breakthroughs in the area include the development of efficient quantum algorithms for Hamiltonian simulation, which is central to the studies of chemical and biological processes [2–8]. Recently, Berry, Childs and Kothari presented an algorithm for sparse Hamiltonian simulation achieving near-linear scaling with the sparsity and sublogarithmic scaling with the inverse of the error [8]. Additionally, using the Hamiltonian simulation algorithm as an essential ingredient, Harrow *et al.* [9] showed that for a sparse and well-conditioned matrix *A*, there is an efficient algorithm (known as the HHL algorithm) that provides a quantum state proportional to the solution $\sum _{j}{x}_{j}|j\u27e9$ of the linear system of equations *A*** x**=

**.**

*b*However, as proven by Childs & Kothari [10], it is impossible to perform a generic simulation of an arbitrary dense Hamiltonian *H* in ${\mathbb{C}}^{N\times N}$ in time $O(\mathrm{poly}(\parallel H\parallel ,\mathrm{log}N))$, where ∥*H*∥ is the spectral norm, but possible for certain non-trivial classes of Hamiltonians. It is then natural to ask under what conditions we can extend the sparse Hamiltonian simulation algorithm and the HHL algorithm to the realm of dense matrices. In this paper, we use the ‘unitary decomposition’ approach developed by Berry *et al.* [7] to implement dense circulant Hamiltonians in time $O(\mathrm{poly}(\parallel H\parallel ,\mathrm{log}N))$. Combining this with the HHL algorithm, we can also efficiently implement the inverse of dense circulant matrices and thus solve systems of circulant matrix linear equations.

Furthermore, we provide an efficient algorithm to implement circulant matrices *C* directly, by decomposing them into a linear combination of unitary matrices. We then apply the same technique to implement block circulant matrices, Toeplitz and Hankel matrices, which have significant applications in physics, mathematics and engineering [11–21]. For example, we can simulate classical random walks on circulant, Toeplitz and Hankel graphs [22,23]. In fact, any arbitrary matrix can be decomposed into a product of Toeplitz matrices [24]. If the number of Toeplitz matrices required is in the order of $O(\mathrm{poly}(\mathrm{log}N))$, we can have an efficient quantum circuit.

This paper is organized as follows. In §2, we present an algorithm to implement circulant matrices, followed by discussions on block circulant matrices, Toeplitz and Hankel matrices in §3. In §§4 and 5, we provide efficient methods to simulate circulant Hamiltonians and to implement the inverse of circulant matrices. In §6, we describe a technique to efficiently implement products of circulant matrices. In the last section, we provide an example application in solving the equation of motion for vibrating systems with cyclic symmetry.

### 2. Implementation of circulant matrices

A circulant matrix has each row right-rotated by one element with respect to the previous row, defined as

*N*-dimensional vector

**=(**

*c**c*

_{0}

*c*

_{1}⋯

*c*

_{N−1}) [25]. In this paper, we will assume

*c*

_{j}to be non-negative for all

*j*, which is often the case in practical applications. We also assume that the spectral norm (the largest eigenvalue) $\parallel C\parallel =\sum _{j=0}^{N-1}{c}_{j}$ of the circulant matrix

*C*is equal to 1 for simplicity.

Note that *C* can be decomposed into a linear combination of efficiently realizable unitary matrices as follows:

*et al.*[7]. For completeness, we restate their method as lemma 2.1 given below.

### Lemma 2.1.

*Let* $M=\sum _{j=0}^{J}{\alpha}_{j}{W}_{j}$ *be a linear combination of unitaries W*_{j} *with α*_{j}≥0 *for all j and* $\sum _{j=0}^{J}{\alpha}_{j}=1$. *Let O*_{α} *be any operator that satisfies* ${O}_{\alpha}|{0}^{m}\u27e9=\sum _{j=0}^{J}\sqrt{{\alpha}_{j}}|j\u27e9$, *where m is the number of qubits used to represent* |*j*〉, *and* $\mathrm{select}(W)=\sum _{j=0}^{J}|j\u27e9\u27e8j|\otimes {W}_{j}$. *Then*

*where*(|0

^{m}〉〈0

^{m}|⊗

*I*)|

*Ψ*

^{⊥}〉=0.

Unless stated otherwise, we assume that *N*=2^{L}, where *L* is an integer. If *N* is not a power of two, we will need to embed the system into a larger Hilbert space whose dimension is a power of two. On the other hand, it is also convenient to simply discretize practical problems using powers of two. Lemma 2.1 can be directly applied to implement the circulant matrix *C*, as shown in figure 1, by taking *M*=*C*, *α*_{j}=*c*_{j}, *W*_{j}=*V* _{j}, *J*=2^{L} and *m*=*L*. Since select(*V*)|*j*〉|*k*〉=|*j*〉|(*k*−*j*) mod *N*〉, it can be implemented using quantum adders [26,27], which requires $O(\mathrm{log}N)$ one- or two-qubit gates. Note that when *N* is not a power of two, it may take additional $O(\mathrm{log}N)$ ancillary qubits to implement the ‘*mod* *N*’ operation in select(*V*), for example, by first subtracting *N* from *k*−*j* and then using the sign qubit to control the ‘*mod* *N*’ operation.

A measurement result of |0^{L}〉 in the first register generates the required state *C*|*ψ*〉 in the second register. The probability of this measurement outcome is *O*(∥*C*|*ψ*〉∥^{2}). With the help of amplitude amplification [28], this can be further improved, requiring only *O*(1/∥*C*|*ψ*〉∥) rounds of application of $({O}_{c}^{\u2020}\otimes I)\hspace{0.17em}\mathrm{select}(V)({O}_{c}\otimes I)$. The amplitude amplification procedure also requires the same number of applications of *O*_{ψ}, where *O*_{ψ}|0^{L}〉=|*ψ*〉, and its inverse in order to reflect quantum states about the initial state |0^{L}〉|*ψ*〉. If *O*_{ψ} is unknown, amplitude amplification is not applicable and we will need to repeat the measuring process in figure 1 *O*(1/∥*C*|*ψ*〉∥^{2}) times, during which *O*(1/∥*C*|*ψ*〉∥^{2}) copies of |*ψ*〉 are required. It is worth noting that with the assumption *c*_{j}≥0, *C* is unitary if and only if *C*=*V* _{j}. In other words, a non-trivial circulant matrix is non-unitary and therefore, the oblivious amplitude amplification procedure [29] cannot be applied.

Provided with the oracle *O*_{c} satisfying ${O}_{c}|{0}^{L}\u27e9=\sum _{j=0}^{N-1}\sqrt{{c}_{j}}|j\u27e9$, theorem 2.2 follows directly from the above discussions. *O*_{c} can be efficiently implemented for certain efficiently computable vectors ** c** [30–32]. Another way to construct states like $\sum _{j=0}^{N-1}\sqrt{{c}_{j}}|j\u27e9$ is via qRAM, which uses

*O*(

*N*) hardware resources but only $O(\mathrm{log}N)$ operations to access them [33,34].

### Theorem 2.2 (Implementation of circulant matrices).

*There exists an algorithm creating the quantum state C*|*ψ*〉 *for an arbitrary quantum state* $|\psi \u27e9=\sum _{k=0}^{N-1}{\psi}_{k}|k\u27e9$, *using O*(1/∥*C*|*ψ*〉∥) *calls of O*_{c}*, O*_{ψ} *and their inverses, as well as* $O(\mathrm{log}N/\parallel C|\psi \u27e9\parallel )$ *additional one- or two-qubit gates.*

The complexity in theorem 2.2 is inversely proportional to the square root of *p*=∥*C*|*ψ*〉∥^{2}, which depends on the quantum state to be acted upon. Specifically, |*C*|*ψ*〉|^{2}=〈*ψ*∥*C*^{†}*C*∥*ψ*〉=〈*ψ*|*FΛ*^{†}*F*^{†}*FΛF*^{†}|*ψ*〉=〈*ψ*|*FΛ*^{†}*ΛF*^{†}|*ψ*〉. Here we use the diagonal form of *C* [25], *C*=*FΛF*^{†}, where *F* is the Fourier matrix with ${F}_{kj}={\mathrm{e}}^{2\pi ijk/N}/\sqrt{N}$ and *Λ* is a diagonal matrix of eigenvalues given by ${\mathit{\Lambda}}_{k}=\sum _{j=0}^{N-1}{c}_{j}\hspace{0.17em}{\mathrm{e}}^{2\pi ijk/N}$. Since the spectral norm ∥*C*∥ of the circulant matrix *C* is equal to one, we have *p*=〈*ψ*|*FΛ*^{†}*ΛF*^{†}|*ψ*〉≥1/*κ*^{2}, where *κ* is the condition number, defined as the ratio between *C*’s largest and smallest (absolute value of) eigenvalues [9]. Therefore, our algorithm is bound to perform well when $\kappa =O(\mathrm{poly}(\mathrm{log}N))$. In the ideal case where *κ*=1 and *p*=1, the vector ** c** is a unit basis in which only one element is equal to one and the others are zero.

Even when *κ* is large, our algorithm is still efficient when the input quantum state after Fourier transform is in the subspace whose corresponding eigenvalues are large. For example, when ${\mathit{\Lambda}}_{k}=\mathrm{cos}(2\pi k/N)$ we have $\kappa =\mathrm{\infty}$ when *N*>2. $p=\u27e8\varphi |{\mathit{\Lambda}}^{\u2020}\mathit{\Lambda}|\varphi \u27e9=\sum _{k=0}^{N-1}\mathrm{cos}{(2\pi k/N)}^{2}|{\varphi}_{k}{|}^{2}\ge \sum _{k\notin [N/8,3N/8]\cup [5N/8,7N/8]}\frac{1}{2}|{\varphi}_{k}{|}^{2}$ in which $|\varphi \u27e9:={F}^{\u2020}|\psi \u27e9=\sum _{k=0}^{N-1}{\varphi}_{k}|k\u27e9$. The success rate is therefore lower-bounded by a constant as long as the input quantum state is restricted in a subspace such that *ϕ*_{k}=0 when *k*∈[*N*/8,3*N*/8]∪[5*N*/8,7*N*/8].

### 3. Circulant-like matrices

#### 3.1. Block circulant matrices

Some block circulant matrices with special structures can also be implemented efficiently in a similar manner. We assume the blocks are *N*′-dimensional matrices and ${L}^{\prime}=\mathrm{log}{N}^{\prime}$ in the following discussions.

Firstly, when each block is a unitary operator up to a constant factor (i.e. ${\mathbf{C}}_{j}={c}_{j}{\mathbf{U}}_{j}$), we have a unitary block (UB) matrix:

*C*

_{UB}using the same algorithm discussed in §2 as illustrated in figure 2

*a*.

Specifically, when the set of blocks ${\{{\mathbf{U}}_{j}\}}_{j=0}^{N-1}$ are one-dimensional, we can implement complex-valued circulant matrices with efficiently computable phase. For example, for ${\mathbf{U}}_{j}=({e}^{i\theta j})$, *j*=0,1,…,*N*−1, circulant matrices with the parameter vector ** c**=(

*c*

_{0},

*e*

^{iθ}

*c*

_{1},…, e

^{i(N−1)θ}

*c*

_{N−1}) can be implemented efficiently. Moreover, if

*θ*=

*π*,

**=(**

*c**c*

_{0},−

*c*

_{1},…,(−1)

^{N−1}

*c*

_{N−1}) corresponding to the circulant matrix with negative elements on odd-numbered sites is efficiently implementable.

Another important family is block circulant matrices with circulant blocks (CB), which has found a wide range of applications in algorithms, mathematics, etc. [18–21]. It is defined as follows:

*N*′-dimensional vector

*c*_{j}=(

*c*

_{j0}

*c*

_{j1}⋯

*c*

_{j(N′−1)}).

*C*

_{CB}is a

*N*×

*N*′-dimensional matrix determined by

*N*×

*N*′ parameters {

*c*

_{jj′}}

_{j=0,…,N−1 j′=0,…,N′−1}. It can be decomposed as follows:

*O*

_{c′}satisfying ${O}_{{c}^{\prime}}|{0}^{L+{L}^{\prime}}\u27e9=\sum _{j=0}^{N-1}\sum _{j=0}^{{N}^{\prime}-1}{c}_{j{j}^{\prime}}|j\u27e9|{j}^{\prime}\u27e9$, we can implement

*C*

_{CB}using the quantum circuit shown in figure 2

*b*, which adopts a combination of two quantum subtractors.

#### 3.2. Toeplitz and Hankel matrices

A Toeplitz matrix is a matrix in which each descending diagonal from left to right is constant, which can be written explicitly as

*N*−1 parameters. We focus on the situation where

*t*

_{j}≥0 for all

*j*as in §2. Clearly, when

*t*

_{−(N−i)}=

*t*

_{i}for all

*i*,

*T*is a circulant matrix. Although a Toeplitz matrix is not circulant in general, any Toeplitz matrix

*T*can be embedded in a circulant matrix [15,35], defined by

*B*

_{T}is another Toeplitz matrix defined by

Therefore, by implementing *C*_{T}, we obtain a quantum state proportional to |0〉*T*|*ψ*〉+|1〉*B*_{T}|*ψ*〉. Then we do a quantum measurement on the single qubit (in the second register in figure 3) to obtain the quantum state *T*|*ψ*〉. The success rate is ∥*T*|*ψ*〉∥^{2} according to theorem 2.2 under the normalization condition that $\sum _{j=-(N-1)}^{N-1}{t}_{j}=\sum _{j=0}^{N-1}{c}_{j}=1$. With the help of amplitude amplification, only *O*(1/∥*T*|*ψ*〉∥) applications of the circuit in figure 3 are required.

A Hankel matrix is a square matrix in which each ascending skew-diagonal from left to right is constant, which can be written explicitly as

*N*−1 non-negative parameters. A permutation matrix $P={\sigma}_{x}^{\otimes L}$ transforms a Hankel matrix into a Toeplitz matrix. It can be easily verified that

*T*=

*HP*and

*H*=

*TP*, in which

*t*

_{j}=

*h*

_{j}for all

*j*. Note that when

*N*is not a power of two, we need to be careful with the embedding when mapping a circulant matrix into a Hankel matrix. The subspace

*span*{|0〉,…,|

*N*−1〉} in the implementation of circulant matrices corresponds to the subspace

*span*{|2

^{L}−

*N*〉,…,|2

^{L}−1〉} in the implementation of Hankel matrices.

Therefore by inserting the permutation *P* before the implementation of *T*, the circuit in figure 3 can be used to implement *H*, and the success rate is ∥*H*|*ψ*〉∥^{2} under the normalization condition that $\sum _{j=-(N-1)}^{N-1}{h}_{j}=\sum _{j=0}^{N-1}{c}_{j}=1$. With the help of amplitude amplification, only *O*(1/∥*H*|*ψ*〉∥) applications are required.

In comparison with existing algorithms, such as that described in [35], the above described quantum circuit provides a better way to realize circulant-like matrices, requiring fewer resources and with lower complexity. For example, only $2\mathrm{log}N$ qubits are required to implement *N*-dimensional Toeplitz matrices, which is a significant improvement over the algorithm presented in [35] via sparse Hamiltonian simulations. More importantly, this is an exact method and its complexity does not depend on an error term. It is also not limited to sparse circulant matrices *C* as in [35]. Moreover, implementation of non-unitary matrices, such as circulant matrices, is not only of importance in quantum computing, but also a significant ingredient in quantum channel simulators [36,37], because the set of Kraus operators in the quantum channel $\rho \mapsto \sum _{i}{K}_{i}\rho {K}_{i}^{\u2020}$ is normally non-unitary [1]. The simplicity of our circuit increases its feasibility in experimental realizations.

### 4. Circulant Hamiltonians

Hamiltonian simulation is expected to be one of the most important undertakings for quantum computation. It is therefore important to explore the possibility of efficient implementation of circulant Hamiltonians because of their extensive applications. Particularly, the implementation of e^{−iCt} is equivalent to the implementation of continuous-time quantum walks on a weighted circulant graph [38,39]. Moreover, simulation of Hamiltonians is also an important part in the HHL algorithm to solve linear systems of equations [9].

A number of algorithms have been shown to be able to efficiently simulate sparse Hamiltonians [2–8], including the unitary decomposition approach [7]. We show that this approach can be extended to the simulation of dense circulant Hamiltonians. It is well known that circulant matrices are diagonalizable as e^{−iCt}=*F* *e*^{−iΛt}*F*^{†}. In general, implementing an arbitrary diagonal unitary requires up to $O(N\mathrm{log}N)$ one- or two-qubit gates [40]. However, when ${\{{\mathit{\Lambda}}_{k}\}}_{k=0}^{N-1}$ can be efficiently computed, one can efficiently implement e^{−iCt} [23,41,42].

In this section, we will focus on the simulation of Hermitian circulant matrices, when e^{−iCt} is unitary. For completeness, we first summarize briefly the unitary decomposition approach in [7] and then discuss how it can be used to efficiently simulate dense circulant Hamiltonians. To simulate *U*=e^{−iCt}, the evolution time *t* is divided into *r* segments with *U*_{r}=e^{−iCt/r}, which can be approximated as $\stackrel{~}{U}=\sum _{k=0}^{K}1/k!{(-\mathrm{iCt}/r)}^{k}$ with error *ϵ*. It can be proven that if we choose $K=O(\mathrm{log}(r/\u03f5)/\mathrm{log}\mathrm{log}(r/\u03f5))=O(\mathrm{log}(t/\u03f5)/\mathrm{log}\mathrm{log}(t/\u03f5))$, then $\parallel {U}_{r}-\stackrel{~}{U}\parallel \le \u03f5/r$ and the total error is within *ϵ*.

Since $C=\sum _{j=0}^{N-1}{c}_{j}{V}_{j}$ as given by equation (2.2), we have

*W*

_{(k,j1,…,jk)}=(−

*i*)

^{k}

*V*

_{j1}⋯

*V*

_{jk}and

^{k}0

^{K−k}〉 is the unary encoding of

*k*. Here,

*s*is the normalization coefficient and we choose $r=\lceil t/\mathrm{ln}2\rceil $ so that

*W*

_{j}=

*W*

_{(k,j1,…,jk)},

*J*=

*KN*

^{K}and

*m*=

*K*+

*KL*in lemma 2.1, we have

^{K+KL}〉〈0

^{K+KL}|⊗

*I*)|

*Ψ*

^{⊥}〉=0. It has been shown in [7] that after one step of oblivious amplitude amplification procedure [29],

*U*

_{r}=e

^{−iCt/r}can be simulated within error

*ϵ*/

*r*. The oblivious amplitude amplification procedure avoids the repeated preparations of |

*ψ*〉 so that $\stackrel{~}{U}|\psi \u27e9$ can be obtained using only one copy of |

*ψ*〉, as shown in figure 4. The total complexity depends on the number of gates required to implement select(

*W*) and

*O*

_{α}.

If *C* is not Hermitian (and $\stackrel{~}{U}$ is not at least approximately unitary), the oblivious amplitude amplification procedure [29] will not be applicable, and then we have to resort to the traditional amplitude amplification [28]. This will lead to a complexity depending exponentially on *t* because we have to run the amplitude amplification recursively, but the complexity will still depend logarithmically on *N*.

### Theorem 4.1 (Simulation of circulant Hamiltonians).

*There exists an algorithm performing e*^{−iCt} *on an arbitrary quantum state |ψ〉 within error ϵ, using* $O(t(\mathrm{log}(t/\u03f5)/\mathrm{log}\mathrm{log}(t/\u03f5)))$ *calls of controlled-O*_{c}^{1} *and its inverse, as well as* $O(t(\mathrm{log}N)(\mathrm{log}(t/\u03f5)/\mathrm{log}\mathrm{log}(t/\u03f5)))$ *additional one- and two-qubit gates.*

### Proof.

We first consider the number of gates used to implement *O*_{α} in equation (4.2). It can be decomposed into two steps. The first step is to create the normalized version of the state $\sum _{k=0}^{K}\sqrt{{(t/r)}^{k}/k!}|{1}^{k}{0}^{K-k}\u27e9$ from the initial state |0^{K}〉, which takes *O*(*K*) consecutive one-qubit rotations on each qubit. We then apply *K* sets of controlled-*O*_{c} to transform |0^{L}〉 into $\sum _{j=0}^{N-1}\sqrt{{c}_{j}}|j\u27e9$ when the control qubit is |1〉. We therefore need *O*(*K*) calls of controlled-*O*_{c} and *O*(*K*) additional one-qubit gates to implement *O*_{α}.

Next we focus on the implementation of

*V*

_{j}|ℓ〉=|(ℓ−

*j*) mod

*N*〉, we can transform |

*ψ*〉 into

*V*

_{j1}⋯

*V*

_{jk}|

*ψ*〉 by applying

*K*quantum subtractors between |

*j*〉

_{j=j1,j2,…,jk,0,…}and |

*ψ*〉.

*K*phase gates on each of the first

*K*qubits multiply the amplitude by (−

*i*)

^{k}. Therefore, select(

*W*) can be decomposed into $O(K\mathrm{log}N)$ one or two-qubit gates.

In summary, *O*(*K*) calls of controlled-*O*_{c} and its inverse as well as $O(K\mathrm{log}N)$ additional one-qubit gates are sufficient to implement one segment e^{−iCt/r}; and the total complexity to implement *r* segments will be *O*(*tK*) calls of controlled-*O*_{c} and its inverse as well as $O(tK\mathrm{log}N)$ additional one-qubit gates, where $K=O(\mathrm{log}(t/\u03f5)/\mathrm{log}\mathrm{log}(t/\u03f5))$. ▪

Note that we assumed the spectral norm ∥*C*∥=1. To explicitly put it in the complexity in theorem 4.1, we can simply replace *t* by ∥*C*∥*t*.

### 5. Inverse of circulant matrices

Following from §4, we now show that the HHL algorithm can be extended to solve systems of circulant matrix linear equations. We assume *C* to be Hermitian in this section in order for the phase estimation procedure to work.

### Theorem 5.1 (Inverse of circulant matrices).

*There exists an algorithm creating the quantum state C*^{−1}|*ψ*〉/∥*C*^{−1}|*ψ*〉∥ *within error ϵ given an arbitrary quantum state* |*ψ*〉, *using* $\stackrel{~}{O}({\kappa}^{2}/\u03f5)$ *calls of controlled-O*_{c} *and its inverse, O*(*κ*) *calls of O*_{ψ}*, as well as* $\stackrel{~}{O}({\kappa}^{2}\mathrm{log}N/\u03f5)$ *additional one- and two-qubit gates.*^{2}

### Proof.

The basic procedure is the same as the HHL algorithm [9], except that *C* is a dense circulant matrix rather than sparse as required by the HHL algorithm, which is summarized below.

Apply the oracle

*O*_{ψ}to create the input quantum state |*ψ*〉:$$|{0}^{L}\u27e9\stackrel{{O}_{\psi}}{\to}|\psi \u27e9=\sum _{j=0}^{N-1}{b}_{j}|{u}_{j}\u27e9,$$where ${\{|{u}_{j}\u27e9\}}_{j=0}^{N-1}$ are the eigenvectors of*C*.Run phase estimation of the unitary operator e

^{i2πC}:$$\sum _{j=0}^{N-1}{b}_{j}|{u}_{j}\u27e9\to \sum _{j=0}^{N-1}{b}_{j}|{u}_{j}\u27e9|{\mathit{\Lambda}}_{j}\u27e9,$$where*Λ*_{j}are the eigenvalues of*C*and*Λ*_{j}≤1.Perform a controlled-rotation on an ancillary qubit:

$$\sum _{j=0}^{N-1}{b}_{j}|{u}_{j}\u27e9|{\mathit{\Lambda}}_{j}\u27e9|0\u27e9\to \sum _{j=0}^{N-1}{b}_{j}|{u}_{j}\u27e9|{\mathit{\Lambda}}_{j}\u27e9(\frac{1}{(\kappa {\mathit{\Lambda}}_{j})|1\u27e9}+\sqrt{1-\frac{1}{({\kappa}^{2}{\mathit{\Lambda}}_{j}^{2})}}|0\u27e9),$$where*κ*is the condition number defined in §2 to make sure that 1/(*κΛ*_{j})≤1 for all*j*. The realization of this controlled-rotation requires the computation of ${\mathit{\Lambda}}_{j}^{-1}$’s [43].Undo the phase estimation and then measure the ancillary qubit. Conditioned on getting 1, we have an output state $\propto \sum _{j=0}^{N-1}{b}_{j}/{\mathit{\Lambda}}_{j}|{u}_{j}\u27e9$ and the success rate $p=\sum _{j=0}^{N-1}|{b}_{j}/\kappa {\mathit{\Lambda}}_{j}{|}^{2}=\mathit{\Omega}(1/{\kappa}^{2})$.

Error occurs in step 2 in Hamiltonian simulation and phase estimation. The complexity scales sublogarithmically with the inverse of error in Hamiltonian simulation as in theorem 4.1 and scales linearly with it in phase estimation [1]. The dominant source of error is phase estimation. Following from the error analysis in [9], a precision *O*(*ϵ*/*κ*) in phase estimation results in a final error *ϵ*. Taking the success rate *p*=*Ω*(1/*κ*^{2}) into consideration, the total complexity would be $\stackrel{~}{O}({\kappa}^{2}/\u03f5)$, with the help of amplitude amplification [28]. ▪

For *s*-sparse Hamiltonians (with at most *s* non-zero entries in any row or column), the HHL algorithm scales as $\stackrel{~}{O}({s}^{2}{\kappa}^{2}\mathrm{log}N/\u03f5)$ [9]. In this work, we extended the HHL procedure to dense Hamiltonians with special structure and proved the scaling is independent of matrix sparsity. This simplification stems from the efficient implementation of select(*V*) which makes possible the decomposition of *C* into *O*(*N*) terms without introducing *O*(*N*) into the computational complexity.

### 6. Products of circulant matrices

Products of circulant matrices are also circulant matrices, because a circulant matrix can be decomposed into a linear combination of ${\{{V}_{j}\}}_{j=0}^{N-1}$ that constitute a cyclic group of order *N* (we have *V* _{j}*V* _{k}=*V* _{(j+k) mod N}). Suppose *C*^{(1,2)}=*C*^{(1)}*C*^{(2)} is the product of two circulant matrices *C*^{(1)} and *C*^{(2)} which have a parameter vector *c*^{(1,2)}, where

*c*^{(1)}and

*c*^{(2)}are each the parameters of

*C*

^{(1)}and

*C*

^{(2)}. Clearly, when the spectral norm of

*C*

^{(1)}and

*C*

^{(2)}are one, the spectral norm of

*C*

^{(1,2)}is also one. Classically, to calculate the parameters

*c*^{(1,2)}would take up

*O*(

*N*) space. However, in the quantum case, we will show that

*O*

_{c(1,2)}, encoding

*c*^{(1,2)}, can be prepared using one

*O*

_{c(1)}and one

*O*

_{c(2)}. It means that the oracle for a product of circulant matrices can be efficiently prepared when its factor circulants are efficiently implementable, as illustrated in figure 5.

### Theorem 6.1 (Products of circulant matrices).

*There exists an algorithm creating the oracle O*_{c(1,2)}*, which satisfies*

*where*|

*Φ*

_{j}〉

*is a unit quantum state dependent on j, using one O*

_{c(1)}

*, one O*

_{c(2)}

*and*$O(\mathrm{log}N)$

*additional one- and two-qubit gates.*

### Proof.

We need 2*L* ancillary qubits divided into two registers to construct the oracle for the product of two circulant matrices. We start by applying *O*_{c(1)} and *O*_{c(2)} on the last 2 registers, we obtain

*j*≡(

*j*

_{1}+

*j*

_{2}) mod

*N*. This can be achieved using two quantum adders, we obtain the state

*j*〉 is equal to $\sqrt{\sum _{\begin{array}{c}{j}_{1},{j}_{2}\\ {j}_{1}+{j}_{2}\equiv j\phantom{\rule{0.667em}{0ex}}\mathrm{mod}\hspace{0.17em}\hspace{0.17em}N\end{array}}{(\sqrt{{c}_{{j}_{1}}{c}_{{j}_{2}}})}^{2}}=\sqrt{{c}_{j}^{(1,2)}}$. ▪

This algorithm can be easily extended to implementing oracles for products of *d* circulants, in which *d* oracles of factor circulants and *dL* ancillary qubits are needed. Though the oracle described in theorem 6.1 may not be useful in all quantum algorithms, owing to the additional |*Φ*_{j}〉 in equation (6.2), it is applicable in §§2 and 4 according to lemma 6.2 (the generalized form of lemma 2.1) described below. It implies that this technique could also be useful in other algorithms related to circulant matrices.

### Lemma 6.2.

*Let* $M=\sum _{{\alpha}_{j}}{\alpha}_{j}{W}_{j}$ *be a linear combination of unitaries W*_{j} *with α*_{j}≥0 *for all j and* $\sum _{j}{\alpha}_{j}=1$. *Let O*_{α} *be any operator that satisfies* ${O}_{\alpha}|{0}^{m}\u27e9=\sum _{j}\sqrt{{\alpha}_{j}}|j\u27e9|{\mathit{\Phi}}_{j}\u27e9$ (*m is the number of qubits used to represent* |*j*〉|*Φ*_{j}〉) *and* $\mathrm{select}(W)=\sum _{j}|j\u27e9\u27e8j|\otimes I\otimes {W}_{j}$. *Then*

*where*(|0

^{m}〉〈0

^{m}|⊗

*I*)|

*Ψ*

^{⊥}〉=0.

### Proof.

aaaaa

### 7. Application: solving cyclic systems

Vibration analysis of mechanical structures with cyclic symmetry has been a subject of considerable studies in acoustics and mechanical engineering [14,17]. Here we provide an example where the above proposed quantum scheme can outperform classical algorithms in solving the equation of motion for vibrating and rotating systems with certain cyclic symmetry.

The equation of motion for a cyclically symmetric system consisting of *N* identical sectors, as shown in figure 6, can be written as

**and**

*q***are**

*f**N*-dimensional vectors, denoting the displacement of and the external force acting on each individual sector, respectively. The mass, damping and stiffness matrices are all circulants, represented by

*M*=circ(

*m*

_{1},

*m*

_{2},…,

*m*

_{N}),

*D*=circ(

*d*

_{1},

*d*

_{2},…,

*d*

_{N}) and

*K*=circ(

*s*

_{1},

*s*

_{2},…,

*s*

_{N}).

Assume all sectors have the same mass (*M*∝*I*) and there is zero damping (*D*=0). If the system is under the so-called travelling wave engine order excitation, the equation of motion can be simplified as [14]:

*f*

_{j}=

*f*e

^{i2πnj/N}for the external force vector

**,**

*f**n*is the order of excitation and

*Ω*is the angular frequency of the excitation. We search for solutions of the form

**=**

*q*

*q*_{0}*e*

^{inΩt}, which leads to

*K*−

*n*

^{2}

*Ω*

^{2}

*I*is a circulant matrix, we can use theorem 5.1 to calculate

1.

*K*−*n*^{2}*Ω*^{2}*I*is Hermitian. This is generally true for symmetric cyclic systems, where the coupling between*q*_{j}and*q*_{j+d}and the coupling between*q*_{j}and*q*_{j−d}are physically the same for any sector*j*and distance*d*.2.

*K*−*n*^{2}*Ω*^{2}*I*has non-negative (or non-positive) entries. Although this is not in general true, theorem 5.1 will work under a slight modification. We observe that the off-diagonal elements of*K*−*n*^{2}*Ω*^{2}*I*are always negative because the coupling force between two connecting sectors is always in the opposite direction to their relative motion.— If the diagonal elements of

*K*−*n*^{2}*Ω*^{2}*I*are also negative, then no modification to the proposed procedure is necessary.— If the diagonal elements of

*K*−*n*^{2}*Ω*^{2}*I*are positive, using the technique stated in §(a), we can simply replace select(*V*) with Ref_{0}⋅select(*V*), where Ref_{0}=|0^{L}〉〈0^{L}|−2*I*is a reflection operator operating on the first register.

3. The condition number

*κ*of*K*−*n*^{2}*Ω*^{2}*I*is small. This is true when the couplings among sectors are relatively weak—when |*K*_{0}−*n*^{2}*Ω*^{2}|≫*K*_{1}where*K*_{0}characterizes the coupling between a sector and the exterior and*K*_{1}characterizes the coupling among sectors.4. The corresponding oracle

*O*_{c}of the circulant matrix*K*−*n*^{2}*Ω*^{2}*I*can be efficiently implemented. It requires either there is a special structure of*K*−*n*^{2}*Ω*^{2}*I*or the information of*K*−*n*^{2}*Ω*^{2}*I*is stored in a qRAM in advance.

If all four conditions are satisfied, we have an exponential speed-up compared to classical computation. Note that the output ** q_{0}** is stored in quantum amplitudes, which cannot be read out directly. However, further computation steps can efficiently provide practically useful information about the system from the vector

**, for example the expectation value**

*q*_{0}

*q*_{0}^{†}

*M*

**for some linear operator**

*q*_{0}*M*or the similarity between two cyclic systems 〈

*q*′

_{0}|

*q*

_{0}〉 [9]. This type of speed-up is not achievable classically for it takes at least

*O*(

*N*) steps to read out the value of

**. It is also worth noting that the proposed algorithm, in contrast to previous quantum algorithms [3–7,9,35], works for dense matrices**

*q*_{0}*K*−

*n*

^{2}

*Ω*

^{2}

*I*. It means that the cyclic systems need not be subject to nearest-neighbour coupling.

### 8. Conclusion

In this paper, we present efficient quantum algorithms for implementing circulant (as well as Toeplitz and Hankel) matrices and block circulant matrices with special structures, which are not necessarily sparse or unitary. These matrices have practically significant applications in physics, mathematics and engineering-related fields. The proposed algorithms provide exponential speed-up over classical algorithms, requiring fewer resources ($2\mathrm{log}N$ qubits) and having lower complexity ($O(\mathrm{log}N/\parallel C|\psi \u27e9\parallel )$) in comparison with existing quantum algorithms. Consequently, they perform better in quantum computing and are more feasible to experimental realization with current technology. Obstacles still exist, though, in the efficient realization of the oracles to generate the components of the circulant matrices.

Besides the implementation of circulant matrices, we discover that we can perform the HHL algorithm on circulant matrices to implement the inverse of circulant matrices, by adopting the Taylor series approach to efficiently simulate circulant Hamiltonians. Owing to the special structure of circulant matrices, we prove that they are one of the types of the dense matrices that can be efficiently simulated. Being able to implement the inverse of circulant matrices opens a door to solving a variety of real-world problems, for example, solving cyclic systems in vibration analysis. Finally, we show that it is possible to construct oracles for products of circulant matrices using the oracles for their factor circulants, a technique that will be useful in related algorithms.

### Data accessibility

This paper does not include further supporting information.

### Authors' contributions

S.S.Z. devised the quantum algorithms described in this paper, carried out the detailed complexity analysis and drafted the manuscript. J.B.W. provided overall guidance and supervision, and helped in drafting and revising the manuscript. All authors gave final approval for publication.

### Competing interests

The authors declare no competing interests.

### Funding

The work is supported by the UWA-Nanjing exchange programme and The University of Western Australia.

## Acknowledgements

We would like to thank Xiaosong Ma, Anuradha Mahasinghe, Jie Pan, Thomas Loke and Shengjun Wu for helpful discussions.

## Footnotes

### References

- 1
Nielsen MA, Chuang IL . 2010**Quantum computation and quantum information**. Cambridge, UK: Cambridge University Press. Crossref, Google Scholar - 2
Lloyd S . 1996 Universal quantum simulators.**Science**, 1073. (doi:10.1126/science.273.5278.1073) Crossref, PubMed, Web of Science, Google Scholar**273** - 3
Berry DW, Ahokas G, Cleve R, Sanders BC . 2007 Efficient quantum algorithms for simulating sparse Hamiltonians.**Commun. Math. Phys.**, 359–371. (doi:10.1007/s00220-006-0150-x) Crossref, Web of Science, Google Scholar**270** - 4
Childs AM, Kothari R . 2010 Simulating sparse Hamiltonians with star decompositions. In*Theory of quantum computation, communication, and cryptography, TQC 2010*(eds W van Dam, VM Kendon, S Severini), pp. 94–103. Berlin, Heidelberg, Germany: Springer-Verlag. Google Scholar - 5
Wiebe N, Berry DW, Høyer P, Sanders BC . 2011 Simulating quantum dynamics on a quantum computer.**J. Phys. A Math. Theoret.**, 445308. (doi:10.1088/1751-8113/44/44/445308) Crossref, Web of Science, Google Scholar**44** - 6
Poulin D, Qarry A, Somma R, Verstraete F . 2011 Quantum simulation of time-dependent Hamiltonians and the convenient illusion of Hilbert space.**Phys. Rev. Lett.**, 170501. (doi:10.1103/PhysRevLett.106.170501) Crossref, PubMed, Web of Science, Google Scholar**106** - 7
Berry DW, Childs AM, Cleve R, Kothari R, Somma RD . 2015 Simulating Hamiltonian dynamics with a truncated Taylor series.**Phys. Rev. Lett.**, 090502. (doi:10.1103/PhysRevLett.114.090502) Crossref, PubMed, Web of Science, Google Scholar**114** - 8
Berry DW, Childs AM, Kothari R . 2015 Hamiltonian simulation with nearly optimal dependence on all parameters. In*Proc. IEEE 56th Annual Symp. on Foundations of Computer Science*, pp. 792–809. IEEE. See http://www.ieee.org. Google Scholar - 9
Harrow AW, Hassidim A, Lloyd S . 2009 Quantum algorithm for linear systems of equations.**Phys. Rev. Lett.**, 150502. (doi:10.1103/PhysRevLett.103.150502) Crossref, PubMed, Web of Science, Google Scholar**103** - 10
Childs AM, Kothari R . 2009 Limitations on the simulation of non-sparse Hamiltonians. See http://arxiv.org/abs/0908.4398. Google Scholar - 11
Rietsch K . 2003 Totally positive Toeplitz matrices and quantum cohomology of partial flag varieties.**J. Am. Math. Soc.**, 363–392. (doi:10.1090/S0894-0347-02-00412-5) Crossref, Web of Science, Google Scholar**16** - 12
Haupt J, Bajwa WU, Raz G, Nowak R . 2010 Toeplitz compressed sensing matrices with applications to sparse channel estimation.**Inform. Theory IEEE Trans.**, 5862–5875. (doi:10.1109/TIT.2010.2070191) Crossref, Web of Science, Google Scholar**56** - 13
Noschese S, Pasquini L, Reichel L . 2013 Tridiagonal Toeplitz matrices: properties and novel applications.**Numer. Linear Algebra Appl.**, 302–326. (doi:10.1002/nla.1811) Crossref, Web of Science, Google Scholar**20** - 14
Olson BJ, Shaw SW, Shi C, Pierre C, Parker RG . 2014 Circulant matrices and their application to vibration analysis.**Appl. Mech. Rev.**, 040803. (doi:10.1115/1.4027722) Crossref, Web of Science, Google Scholar**66** - 15
Ng MK . 2004**Iterative methods for Toeplitz systems**. New York, NY: Oxford University Press. Google Scholar - 16
Peller V . 2012**Hankel operators and their applications**. Berlin, Germany: Springer Science & Business Media. Google Scholar - 17
Kaveh A, Rahami H . 2011 Block circulant matrices and applications in free vibration analysis of cyclically repetitive structures.**Acta Mech.**, 51–62. (doi:10.1007/s00707-010-0382-x) Crossref, Web of Science, Google Scholar**217** - 18
Rjasanow S . 1994 Effective algorithms with circulant-block matrices.**Linear Algebra Appl.**, 55–69. (doi:10.1016/0024-3795(94)90184-8) Crossref, Web of Science, Google Scholar**202** - 19
Tee GJ . 2005 Eigenvectors of block circulant and alternating circulant matrices.**Res. Lett. Info. Math. Sci**., 123–142. Google Scholar**8** - 20
Combescure M . 2009 Block-circulant matrices with circulant blocks, weil sums, and mutually unbiased bases. II. The prime power case.**J. Math. Phys.**, 032104. (doi:10.1063/1.3078420) Crossref, Web of Science, Google Scholar**50** - 21
Petrou M, Petrou C . 2010**Image processing: the fundamentals**. New York, NY: John Wiley & Sons. Crossref, Google Scholar - 22
Delanty M, Steel MJ . 2012 Discretely-observable continuous time quantum walks on Moöbius strips and other exotic structures in 3D integrated photonics.**Phys. Rev. A**, 043821. (doi:10.1103/PhysRevA.86.043821) Crossref, Web of Science, Google Scholar**86** - 23
Qiang X, Loke T, Montanaro A, Aungskunsiri K, Zhou X, O’Brien JL, Wang J, Matthews JC . 2016 Efficient quantum walk on a quantum processor.**Nat. Commun.**, 11511. (doi:10.1038/ncomms11511) Crossref, PubMed, Web of Science, Google Scholar**7** - 24
Ye K, Lim LH . 2016 Every matrix is a product of Toeplitz matrices.**Found. Comput. Math.**, 577–598. (doi:10.1007/s10208-015-9254-z) Crossref, Web of Science, Google Scholar**16** - 25
Golub GH, Van Loan CF . 2012**Matrix computations**,**vol. 3**. Baltimore MD: John Hopkins University Press. Google Scholar - 26
Cuccaro SA, Draper TG, Kutin SA, Moulton DP . 2004 A new quantum ripple-carry addition circuit. See http://arxiv.org/abs/quant-ph/0410184. Google Scholar - 27
Draper TG, Kutin SA, Rains EM, Svore KM . 2004 A logarithmic-depth quantum carry-lookahead adder. See http://arxiv.org/abs/quant-ph/0406142. Google Scholar - 28
Brassard G, Hoyer P, Mosca M, Tapp A . 2002 Quantum amplitude amplification and estimation.**Contemp. Math.**, 53–74. (doi:10.1090/conm/305/05215) Crossref, Google Scholar**305** - 29
Berry DW, Childs AM, Cleve R, Kothari R, Somma RD . 2014 Exponential improvement in precision for simulating sparse Hamiltonians. In*Proc. of the 46th Annual ACM Symp. on Theory of Computing*, pp. 283–292. ACM, New York, NY. Google Scholar - 30
Grover L, Rudolph T . 2002 Creating superpositions that correspond to efficiently integrable probability distributions. See http://arxiv.org/abs/quant-ph/0208112. Google Scholar - 31
Kaye P, Mosca M . 2004 Quantum networks for generating arbitrary quantum states. See http://arxiv.org/abs/quant-ph/0407102. Google Scholar - 32
Soklakov AN, Schack R . 2006 Efficient state preparation for a register of quantum bits.**Phys. Rev. A**, 012307. (doi:10.1103/PhysRevA.73.012307) Crossref, Web of Science, Google Scholar**73** - 33
Giovannetti V, Lloyd S, Maccone L . 2008 Architectures for a quantum random access memory.**Phys. Rev. A**, 052310. (doi:10.1103/PhysRevA.78.052310) Crossref, Web of Science, Google Scholar**78** - 34
Lloyd S, Mohseni M, Rebentrost P . 2013 Quantum algorithms for supervised and unsupervised machine learning. See http://arxiv.org/abs/1307.0411. Google Scholar - 35
Mahasinghe A, Wang JB . 2016 Efficient quantum circuits for Toeplitz and Hankel matrices.**J. Phys. A Math. Theoret.**, 275301. (doi:10.1088/1751-8113/49/27/275301) Crossref, Web of Science, Google Scholar**49** - 36
Lu H *et al.*2015 Universal digital photonic single-qubit quantum channel simulator. See http://arxiv.org/abs/1505.02879. Google Scholar - 37
Wang D-S, Sanders BC . 2015 Quantum circuit design for accurate simulation of qudit channels.**New J. Phys.**, 043004. (doi:10.1088/1367-2630/17/4/043004) Crossref, Web of Science, Google Scholar**17** - 38
Manouchehri K, Wang JB . 2014**Physical implementation of quantum walks**. Berlin, Germany: Springer. Crossref, Google Scholar - 39
Childs AM . 2010 On the relationship between continuous-and discrete-time quantum walk.**Commun. Math. Phys.**, 581–603. (doi:10.1007/s00220-009-0930-1) Crossref, Web of Science, Google Scholar**294** - 40
Bullock SS, Markov IL . 2004 Asymptotically optimal circuits for arbitrary n-qubit diagonal computations.**Q. Inform. Comput.**, 27–47. Web of Science, Google Scholar**4** - 41
Childs AM . 2004 Quantum information processing in continuous time. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, USA. Google Scholar - 42
Zhou S, Loke T, Izaac J, Wang J . 2017 Quantum fourier transform in computational basis.**Q. Inform. Process.**, 82. (doi:10.1007/s11128-017-1515-0) Crossref, Web of Science, Google Scholar**16** - 43
Cao Y, Papageorgiou A, Petras I, Traub J, Kais S . 2013 Quantum algorithm and circuit design solving the Poisson equation.**New J. Phys.**, 013021. (doi:10.1088/1367-2630/15/1/013021) Crossref, Web of Science, Google Scholar**15**