Royal Society Open Science
Open AccessResearch article

Efficient quantum circuits for dense circulant and circulant like operators

S. S. Zhou

S. S. Zhou

Kuang Yaming Honors School, Nanjing University, Nanjing 210093, People’s Republic of China

Department of Physics, Yale Quantum Institute, Yale University, New Haven, CT 06520, USA

Department of Applied Physics, Yale Quantum Institute, Yale University, New Haven, CT 06520, USA

Google Scholar

Find this author on PubMed

J. B. Wang

J. B. Wang

School of Physics, The University of Western Australia, Perth, Western Australia 6009, Australia

[email protected]

Google Scholar

Find this author on PubMed


Circulant matrices are an important family of operators, which have a wide range of applications in science and engineering-related fields. They are, in general, non-sparse and non-unitary. In this paper, we present efficient quantum circuits to implement circulant operators using fewer resources and with lower complexity than existing methods. Moreover, our quantum circuits can be readily extended to the implementation of Toeplitz, Hankel and block circulant matrices. Efficient quantum algorithms to implement the inverses and products of circulant operators are also provided, and an example application in solving the equation of motion for cyclic systems is discussed.

1. Introduction

Quantum computation exploits the intrinsic nature of quantum systems in a way that promises to solve problems otherwise intractable on conventional computers. As the most widely used model of quantum computation, a quantum circuit provides a complete description of a specified quantum algorithm, whose computational complexity is determined by the number of quantum gates required. In general, the number of two-level gates (i.e. unitary matrices acting non-trivially on two-dimensional subspaces, which are universal for computation) needed to decompose an arbitrary unitary in N dimensions scales as O(N2). There are many known N-dimensional matrices that cannot be decomposed as a product of fewer than N−1 two-level gates [1], and thus cannot be implemented efficiently on a quantum computer. An essential research focus in quantum computation is to explore which kinds of linear operations (either unitary or non-unitary) can be efficiently implemented using O(poly(logN)) number of elementary quantum gates (i.e. one- or two-level unitary matrices) and measurements.

Significant breakthroughs in the area include the development of efficient quantum algorithms for Hamiltonian simulation, which is central to the studies of chemical and biological processes [28]. Recently, Berry, Childs and Kothari presented an algorithm for sparse Hamiltonian simulation achieving near-linear scaling with the sparsity and sublogarithmic scaling with the inverse of the error [8]. Additionally, using the Hamiltonian simulation algorithm as an essential ingredient, Harrow et al. [9] showed that for a sparse and well-conditioned matrix A, there is an efficient algorithm (known as the HHL algorithm) that provides a quantum state proportional to the solution jxj|j of the linear system of equations Ax=b.

However, as proven by Childs & Kothari [10], it is impossible to perform a generic simulation of an arbitrary dense Hamiltonian H in CN×N in time O(poly(H,logN)), where ∥H∥ is the spectral norm, but possible for certain non-trivial classes of Hamiltonians. It is then natural to ask under what conditions we can extend the sparse Hamiltonian simulation algorithm and the HHL algorithm to the realm of dense matrices. In this paper, we use the ‘unitary decomposition’ approach developed by Berry et al. [7] to implement dense circulant Hamiltonians in time O(poly(H,logN)). Combining this with the HHL algorithm, we can also efficiently implement the inverse of dense circulant matrices and thus solve systems of circulant matrix linear equations.

Furthermore, we provide an efficient algorithm to implement circulant matrices C directly, by decomposing them into a linear combination of unitary matrices. We then apply the same technique to implement block circulant matrices, Toeplitz and Hankel matrices, which have significant applications in physics, mathematics and engineering [1121]. For example, we can simulate classical random walks on circulant, Toeplitz and Hankel graphs [22,23]. In fact, any arbitrary matrix can be decomposed into a product of Toeplitz matrices [24]. If the number of Toeplitz matrices required is in the order of O(poly(logN)), we can have an efficient quantum circuit.

This paper is organized as follows. In §2, we present an algorithm to implement circulant matrices, followed by discussions on block circulant matrices, Toeplitz and Hankel matrices in §3. In §§4 and 5, we provide efficient methods to simulate circulant Hamiltonians and to implement the inverse of circulant matrices. In §6, we describe a technique to efficiently implement products of circulant matrices. In the last section, we provide an example application in solving the equation of motion for vibrating systems with cyclic symmetry.

2. Implementation of circulant matrices

A circulant matrix has each row right-rotated by one element with respect to the previous row, defined as

using an N-dimensional vector c=(c0 c1cN−1) [25]. In this paper, we will assume cj to be non-negative for all j, which is often the case in practical applications. We also assume that the spectral norm (the largest eigenvalue) C=j=0N1cj of the circulant matrix C is equal to 1 for simplicity.

Note that C can be decomposed into a linear combination of efficiently realizable unitary matrices as follows:

where Vj=k=0N1|(kj)modNk|. Such a linear combination of unitary matrices can be dealt with by the unitary decomposition approach introduced by Berry et al. [7]. For completeness, we restate their method as lemma 2.1 given below.

Lemma 2.1.

Let M=j=0JαjWj be a linear combination of unitaries Wj with αj≥0 for all j and j=0Jαj=1. Let Oα be any operator that satisfies Oα|0m=j=0Jαj|j, where m is the number of qubits used to represent |j〉, and select(W)=j=0J|jj|Wj. Then

where (|0m〉〈0m|⊗I)|Ψ〉=0.

Unless stated otherwise, we assume that N=2L, where L is an integer. If N is not a power of two, we will need to embed the system into a larger Hilbert space whose dimension is a power of two. On the other hand, it is also convenient to simply discretize practical problems using powers of two. Lemma 2.1 can be directly applied to implement the circulant matrix C, as shown in figure 1, by taking M=C, αj=cj, Wj=V j, J=2L and m=L. Since select(V)|j〉|k〉=|j〉|(kj) mod  N〉, it can be implemented using quantum adders [26,27], which requires O(logN) one- or two-qubit gates. Note that when N is not a power of two, it may take additional O(logN) ancillary qubits to implement the ‘modN’ operation in select(V), for example, by first subtracting N from kj and then using the sign qubit to control the ‘modN’ operation.

Figure 1.

Figure 1. Quantum circuit to implement a circulant matrix.

A measurement result of |0L〉 in the first register generates the required state C|ψ〉 in the second register. The probability of this measurement outcome is O(∥C|ψ〉∥2). With the help of amplitude amplification [28], this can be further improved, requiring only O(1/∥C|ψ〉∥) rounds of application of (OcI)select(V)(OcI). The amplitude amplification procedure also requires the same number of applications of Oψ, where Oψ|0L〉=|ψ〉, and its inverse in order to reflect quantum states about the initial state |0L〉|ψ〉. If Oψ is unknown, amplitude amplification is not applicable and we will need to repeat the measuring process in figure 1 O(1/∥C|ψ〉∥2) times, during which O(1/∥C|ψ〉∥2) copies of |ψ〉 are required. It is worth noting that with the assumption cj≥0, C is unitary if and only if C=V j. In other words, a non-trivial circulant matrix is non-unitary and therefore, the oblivious amplitude amplification procedure [29] cannot be applied.

Provided with the oracle Oc satisfying Oc|0L=j=0N1cj|j, theorem 2.2 follows directly from the above discussions. Oc can be efficiently implemented for certain efficiently computable vectors c [3032]. Another way to construct states like j=0N1cj|j is via qRAM, which uses O(N) hardware resources but only O(logN) operations to access them [33,34].

Theorem 2.2 (Implementation of circulant matrices).

There exists an algorithm creating the quantum state C|ψfor an arbitrary quantum state |ψ=k=0N1ψk|k, using O(1/∥C|ψ〉∥) calls of Oc, Oψ and their inverses, as well as O(logN/C|ψ) additional one- or two-qubit gates.

The complexity in theorem 2.2 is inversely proportional to the square root of p=∥C|ψ〉∥2, which depends on the quantum state to be acted upon. Specifically, |C|ψ〉|2=〈ψCCψ〉=〈ψ|FFΛF|ψ〉=〈ψ|ΛF|ψ〉. Here we use the diagonal form of C [25], C=FΛF, where F is the Fourier matrix with Fkj=e2πijk/N/N and Λ is a diagonal matrix of eigenvalues given by Λk=j=0N1cje2πijk/N. Since the spectral norm ∥C∥ of the circulant matrix C is equal to one, we have p=〈ψ|ΛF|ψ〉≥1/κ2, where κ is the condition number, defined as the ratio between C’s largest and smallest (absolute value of) eigenvalues [9]. Therefore, our algorithm is bound to perform well when κ=O(poly(logN)). In the ideal case where κ=1 and p=1, the vector c is a unit basis in which only one element is equal to one and the others are zero.

Even when κ is large, our algorithm is still efficient when the input quantum state after Fourier transform is in the subspace whose corresponding eigenvalues are large. For example, when Λk=cos(2πk/N) we have κ= when N>2. p=ϕ|ΛΛ|ϕ=k=0N1cos(2πk/N)2|ϕk|2k[N/8,3N/8][5N/8,7N/8]12|ϕk|2 in which |ϕ:=F|ψ=k=0N1ϕk|k. The success rate is therefore lower-bounded by a constant as long as the input quantum state is restricted in a subspace such that ϕk=0 when k∈[N/8,3N/8]∪[5N/8,7N/8].

3. Circulant-like matrices

3.1. Block circulant matrices

Some block circulant matrices with special structures can also be implemented efficiently in a similar manner. We assume the blocks are N′-dimensional matrices and L=logN in the following discussions.

Firstly, when each block is a unitary operator up to a constant factor (i.e. Cj=cjUj), we have a unitary block (UB) matrix:

If the set of blocks {Uj}j=0N1 can be efficiently implemented, then by simply replacing select(V)=j=0N1|jj|Vj with j=0N1|jj|(VjUj), we can efficiently implement the block circulant matrices CUB using the same algorithm discussed in §2 as illustrated in figure 2a.
Figure 2.

Figure 2. The quantum circuit to implement block circulant matrices with special structures.

Specifically, when the set of blocks {Uj}j=0N1 are one-dimensional, we can implement complex-valued circulant matrices with efficiently computable phase. For example, for Uj=(eiθj), j=0,1,…,N−1, circulant matrices with the parameter vector c=(c0,ec1,…, ei(N−1)θcN−1) can be implemented efficiently. Moreover, if θ=π, c=(c0,−c1,…,(−1)N−1 cN−1) corresponding to the circulant matrix with negative elements on odd-numbered sites is efficiently implementable.

Another important family is block circulant matrices with circulant blocks (CB), which has found a wide range of applications in algorithms, mathematics, etc. [1821]. It is defined as follows:

where Cj is a circulant matrix specified by a N′-dimensional vector cj=(cj0 cj1cj(N′−1)). CCB is a N×N′-dimensional matrix determined by N×N′ parameters {cjj}j=0,…,N−1 j′=0,…,N′−1. It can be decomposed as follows:
Given an oracle Oc satisfying Oc|0L+L=j=0N1j=0N1cjj|j|j, we can implement CCB using the quantum circuit shown in figure 2b, which adopts a combination of two quantum subtractors.

3.2. Toeplitz and Hankel matrices

A Toeplitz matrix is a matrix in which each descending diagonal from left to right is constant, which can be written explicitly as

specified by 2N−1 parameters. We focus on the situation where tj≥0 for all j as in §2. Clearly, when t−(Ni)=ti for all i, T is a circulant matrix. Although a Toeplitz matrix is not circulant in general, any Toeplitz matrix T can be embedded in a circulant matrix [15,35], defined by
where BT is another Toeplitz matrix defined by
As a result, we use this embedding to implement Toeplitz matrices because

Therefore, by implementing CT, we obtain a quantum state proportional to |0〉T|ψ〉+|1〉BT|ψ〉. Then we do a quantum measurement on the single qubit (in the second register in figure 3) to obtain the quantum state T|ψ〉. The success rate is ∥T|ψ〉∥2 according to theorem 2.2 under the normalization condition that j=(N1)N1tj=j=0N1cj=1. With the help of amplitude amplification, only O(1/∥T|ψ〉∥) applications of the circuit in figure 3 are required.

Figure 3.

Figure 3. The quantum circuit to implement a Toeplitz matrix. In this figure, Oc|0L+1=j=02N1cj|j, where c=(t0 t−1t−(N−1) 0 tN−1t1).

A Hankel matrix is a square matrix in which each ascending skew-diagonal from left to right is constant, which can be written explicitly as

specified by 2N−1 non-negative parameters. A permutation matrix P=σxL transforms a Hankel matrix into a Toeplitz matrix. It can be easily verified that T=HP and H=TP, in which tj=hj for all j. Note that when N is not a power of two, we need to be careful with the embedding when mapping a circulant matrix into a Hankel matrix. The subspace span{|0〉,…,|N−1〉} in the implementation of circulant matrices corresponds to the subspace span{|2LN〉,…,|2L−1〉} in the implementation of Hankel matrices.

Therefore by inserting the permutation P before the implementation of T, the circuit in figure 3 can be used to implement H, and the success rate is ∥H|ψ〉∥2 under the normalization condition that j=(N1)N1hj=j=0N1cj=1. With the help of amplitude amplification, only O(1/∥H|ψ〉∥) applications are required.

In comparison with existing algorithms, such as that described in [35], the above described quantum circuit provides a better way to realize circulant-like matrices, requiring fewer resources and with lower complexity. For example, only 2logN qubits are required to implement N-dimensional Toeplitz matrices, which is a significant improvement over the algorithm presented in [35] via sparse Hamiltonian simulations. More importantly, this is an exact method and its complexity does not depend on an error term. It is also not limited to sparse circulant matrices C as in [35]. Moreover, implementation of non-unitary matrices, such as circulant matrices, is not only of importance in quantum computing, but also a significant ingredient in quantum channel simulators [36,37], because the set of Kraus operators in the quantum channel ρiKiρKi is normally non-unitary [1]. The simplicity of our circuit increases its feasibility in experimental realizations.

4. Circulant Hamiltonians

Hamiltonian simulation is expected to be one of the most important undertakings for quantum computation. It is therefore important to explore the possibility of efficient implementation of circulant Hamiltonians because of their extensive applications. Particularly, the implementation of e−iCt is equivalent to the implementation of continuous-time quantum walks on a weighted circulant graph [38,39]. Moreover, simulation of Hamiltonians is also an important part in the HHL algorithm to solve linear systems of equations [9].

A number of algorithms have been shown to be able to efficiently simulate sparse Hamiltonians [28], including the unitary decomposition approach [7]. We show that this approach can be extended to the simulation of dense circulant Hamiltonians. It is well known that circulant matrices are diagonalizable as e−iCt=Fe−iΛtF. In general, implementing an arbitrary diagonal unitary requires up to O(NlogN) one- or two-qubit gates [40]. However, when {Λk}k=0N1 can be efficiently computed, one can efficiently implement e−iCt [23,41,42].

In this section, we will focus on the simulation of Hermitian circulant matrices, when e−iCt is unitary. For completeness, we first summarize briefly the unitary decomposition approach in [7] and then discuss how it can be used to efficiently simulate dense circulant Hamiltonians. To simulate U=e−iCt, the evolution time t is divided into r segments with Ur=e−iCt/r, which can be approximated as U~=k=0K1/k!(iCt/r)k with error ϵ. It can be proven that if we choose K=O(log(r/ϵ)/loglog(r/ϵ))=O(log(t/ϵ)/loglog(t/ϵ)), then UrU~ϵ/r and the total error is within ϵ.

Since C=j=0N1cjVj as given by equation (2.2), we have

Let W(k,j1,…,jk)=(−i)kV j1V jk and
where |1k0Kk〉 is the unary encoding of k. Here, s is the normalization coefficient and we choose r=t/ln2 so that
By taking M=U~, αj=(t/r)k/k!cj1cjk, Wj=W(k,j1,…,jk), J=KNK and m=K+KL in lemma 2.1, we have
where (|0K+KL〉〈0K+KL|⊗I)|Ψ〉=0. It has been shown in [7] that after one step of oblivious amplitude amplification procedure [29], Ur=e−iCt/r can be simulated within error ϵ/r. The oblivious amplitude amplification procedure avoids the repeated preparations of |ψ〉 so that U~|ψ can be obtained using only one copy of |ψ〉, as shown in figure 4. The total complexity depends on the number of gates required to implement select(W) and Oα.
Figure 4.

Figure 4. The quantum circuit to implement one segment of circulant Hamiltonians. Here Rini|0K=k=0K(t/r)k/k!|1k0Kk and i=|00|+(i)|11|.

If C is not Hermitian (and U~ is not at least approximately unitary), the oblivious amplitude amplification procedure [29] will not be applicable, and then we have to resort to the traditional amplitude amplification [28]. This will lead to a complexity depending exponentially on t because we have to run the amplitude amplification recursively, but the complexity will still depend logarithmically on N.

Theorem 4.1 (Simulation of circulant Hamiltonians).

There exists an algorithm performing e−iCt on an arbitrary quantum state |ψ〉 within error ϵ, using O(t(log(t/ϵ)/loglog(t/ϵ))) calls of controlled-Oc1 and its inverse, as well as O(t(logN)(log(t/ϵ)/loglog(t/ϵ))) additional one- and two-qubit gates.


We first consider the number of gates used to implement Oα in equation (4.2). It can be decomposed into two steps. The first step is to create the normalized version of the state k=0K(t/r)k/k!|1k0Kk from the initial state |0K〉, which takes O(K) consecutive one-qubit rotations on each qubit. We then apply K sets of controlled-Oc to transform |0L〉 into j=0N1cj|j when the control qubit is |1〉. We therefore need O(K) calls of controlled-Oc and O(K) additional one-qubit gates to implement Oα.

Next we focus on the implementation of

which performs the transformation
As V j|ℓ〉=|(ℓ−j) mod  N〉, we can transform |ψ〉 into V j1V jk|ψ〉 by applying K quantum subtractors between |jj=j1,j2,…,jk,0,… and |ψ〉. K phase gates on each of the first K qubits multiply the amplitude by (−i)k. Therefore, select(W) can be decomposed into O(KlogN) one or two-qubit gates.

In summary, O(K) calls of controlled-Oc and its inverse as well as O(KlogN) additional one-qubit gates are sufficient to implement one segment e−iCt/r; and the total complexity to implement r segments will be O(tK) calls of controlled-Oc and its inverse as well as O(tKlogN) additional one-qubit gates, where K=O(log(t/ϵ)/loglog(t/ϵ)). ▪

Note that we assumed the spectral norm ∥C∥=1. To explicitly put it in the complexity in theorem 4.1, we can simply replace t by ∥Ct.

5. Inverse of circulant matrices

Following from §4, we now show that the HHL algorithm can be extended to solve systems of circulant matrix linear equations. We assume C to be Hermitian in this section in order for the phase estimation procedure to work.

Theorem 5.1 (Inverse of circulant matrices).

There exists an algorithm creating the quantum state C−1|ψ〉/∥C−1|ψ〉∥ within error ϵ given an arbitrary quantum state |ψ〉, using O~(κ2/ϵ) calls of controlled-Oc and its inverse, O(κ) calls of Oψ, as well as O~(κ2logN/ϵ) additional one- and two-qubit gates.2


The basic procedure is the same as the HHL algorithm [9], except that C is a dense circulant matrix rather than sparse as required by the HHL algorithm, which is summarized below.

  1. Apply the oracle Oψ to create the input quantum state |ψ〉:

    where {|uj}j=0N1 are the eigenvectors of C.

  2. Run phase estimation of the unitary operator ei2πC:

    where Λj are the eigenvalues of C and Λj≤1.

  3. Perform a controlled-rotation on an ancillary qubit:

    where κ is the condition number defined in §2 to make sure that 1/(κΛj)≤1 for all j. The realization of this controlled-rotation requires the computation of Λj1’s [43].

  4. Undo the phase estimation and then measure the ancillary qubit. Conditioned on getting 1, we have an output state j=0N1bj/Λj|uj and the success rate p=j=0N1|bj/κΛj|2=Ω(1/κ2).

Error occurs in step 2 in Hamiltonian simulation and phase estimation. The complexity scales sublogarithmically with the inverse of error in Hamiltonian simulation as in theorem 4.1 and scales linearly with it in phase estimation [1]. The dominant source of error is phase estimation. Following from the error analysis in [9], a precision O(ϵ/κ) in phase estimation results in a final error ϵ. Taking the success rate p=Ω(1/κ2) into consideration, the total complexity would be O~(κ2/ϵ), with the help of amplitude amplification [28]. ▪

For s-sparse Hamiltonians (with at most s non-zero entries in any row or column), the HHL algorithm scales as O~(s2κ2logN/ϵ) [9]. In this work, we extended the HHL procedure to dense Hamiltonians with special structure and proved the scaling is independent of matrix sparsity. This simplification stems from the efficient implementation of select(V) which makes possible the decomposition of C into O(N) terms without introducing O(N) into the computational complexity.

6. Products of circulant matrices

Products of circulant matrices are also circulant matrices, because a circulant matrix can be decomposed into a linear combination of {Vj}j=0N1 that constitute a cyclic group of order N (we have V jV k=V (j+k) mod  N). Suppose C(1,2)=C(1)C(2) is the product of two circulant matrices C(1) and C(2) which have a parameter vector c(1,2), where

where c(1) and c(2) are each the parameters of C(1) and C(2). Clearly, when the spectral norm of C(1) and C(2) are one, the spectral norm of C(1,2) is also one. Classically, to calculate the parameters c(1,2) would take up O(N) space. However, in the quantum case, we will show that Oc(1,2), encoding c(1,2), can be prepared using one Oc(1) and one Oc(2). It means that the oracle for a product of circulant matrices can be efficiently prepared when its factor circulants are efficiently implementable, as illustrated in figure 5.
Figure 5.

Figure 5. The quantum circuit of Oc(1,2). Here Vj=k=0N1|(k+j)modNk| and controlled-Vj is a quantum adder.

Theorem 6.1 (Products of circulant matrices).

There exists an algorithm creating the oracle Oc(1,2), which satisfies

where |Φjis a unit quantum state dependent on j, using one Oc(1), one Oc(2) and O(logN) additional one- and two-qubit gates.


We need 2L ancillary qubits divided into two registers to construct the oracle for the product of two circulant matrices. We start by applying Oc(1) and Oc(2) on the last 2 registers, we obtain

In order to encode cj(1,2) in the quantum amplitudes, we once again apply quantum adders to achieve our goals. By performing the following transformation:
where j≡( j1+j2) mod  N. This can be achieved using two quantum adders, we obtain the state
because the amplitude of |j〉 is equal to j1,j2j1+j2jmodN(cj1cj2)2=cj(1,2). ▪

This algorithm can be easily extended to implementing oracles for products of d circulants, in which d oracles of factor circulants and dL ancillary qubits are needed. Though the oracle described in theorem 6.1 may not be useful in all quantum algorithms, owing to the additional |Φj〉 in equation (6.2), it is applicable in §§2 and 4 according to lemma 6.2 (the generalized form of lemma 2.1) described below. It implies that this technique could also be useful in other algorithms related to circulant matrices.

Lemma 6.2.

Let M=αjαjWj be a linear combination of unitaries Wj with αj≥0 for all j and jαj=1. Let Oα be any operator that satisfies Oα|0m=jαj|j|Φj (m is the number of qubits used to represent |j〉|Φj〉) and select(W)=j|jj|IWj. Then

where (|0m〉〈0m|⊗I)|Ψ〉=0.




7. Application: solving cyclic systems

Vibration analysis of mechanical structures with cyclic symmetry has been a subject of considerable studies in acoustics and mechanical engineering [14,17]. Here we provide an example where the above proposed quantum scheme can outperform classical algorithms in solving the equation of motion for vibrating and rotating systems with certain cyclic symmetry.

The equation of motion for a cyclically symmetric system consisting of N identical sectors, as shown in figure 6, can be written as

where q and f are N-dimensional vectors, denoting the displacement of and the external force acting on each individual sector, respectively. The mass, damping and stiffness matrices are all circulants, represented by M=circ(m1,m2,…,mN), D=circ(d1,d2,…,dN) and K=circ(s1,s2,…,sN).
Figure 6.

Figure 6. Topology diagram of an N-sector cyclic system. (a) A general cyclic system with coupling between any two sectors which can be solved using theorem 5.1. (b) A cyclic system with nearest-neighbour coupling which can be solved using the HHL algorithm [9].

Assume all sectors have the same mass (MI) and there is zero damping (D=0). If the system is under the so-called travelling wave engine order excitation, the equation of motion can be simplified as [14]:

where the travelling wave is characterized by fj=f ei2πnj/N for the external force vector f, n is the order of excitation and Ω is the angular frequency of the excitation. We search for solutions of the form q=q0einΩt, which leads to
Since Kn2Ω2I is a circulant matrix, we can use theorem 5.1 to calculate
It is important to consider the conditions under which theorem 5.1 works.
  • 1. Kn2Ω2I is Hermitian. This is generally true for symmetric cyclic systems, where the coupling between qj and qj+d and the coupling between qj and qjd are physically the same for any sector j and distance d.

  • 2. Kn2Ω2I has non-negative (or non-positive) entries. Although this is not in general true, theorem 5.1 will work under a slight modification. We observe that the off-diagonal elements of Kn2Ω2I are always negative because the coupling force between two connecting sectors is always in the opposite direction to their relative motion.

    • — If the diagonal elements of Kn2Ω2I are also negative, then no modification to the proposed procedure is necessary.

    • — If the diagonal elements of Kn2Ω2I are positive, using the technique stated in §(a), we can simply replace select(V) with Ref0⋅select(V), where Ref0=|0L〉〈0L|−2I is a reflection operator operating on the first register.

  • 3. The condition number κ of Kn2Ω2I is small. This is true when the couplings among sectors are relatively weak—when |K0n2Ω2|≫K1 where K0 characterizes the coupling between a sector and the exterior and K1 characterizes the coupling among sectors.

  • 4. The corresponding oracle Oc of the circulant matrix Kn2Ω2I can be efficiently implemented. It requires either there is a special structure of Kn2Ω2I or the information of Kn2Ω2I is stored in a qRAM in advance.

If all four conditions are satisfied, we have an exponential speed-up compared to classical computation. Note that the output q0 is stored in quantum amplitudes, which cannot be read out directly. However, further computation steps can efficiently provide practically useful information about the system from the vector q0, for example the expectation value q0Mq0 for some linear operator M or the similarity between two cyclic systems 〈q0|q0〉 [9]. This type of speed-up is not achievable classically for it takes at least O(N) steps to read out the value of q0. It is also worth noting that the proposed algorithm, in contrast to previous quantum algorithms [37,9,35], works for dense matrices Kn2Ω2I. It means that the cyclic systems need not be subject to nearest-neighbour coupling.

8. Conclusion

In this paper, we present efficient quantum algorithms for implementing circulant (as well as Toeplitz and Hankel) matrices and block circulant matrices with special structures, which are not necessarily sparse or unitary. These matrices have practically significant applications in physics, mathematics and engineering-related fields. The proposed algorithms provide exponential speed-up over classical algorithms, requiring fewer resources (2logN qubits) and having lower complexity (O(logN/C|ψ)) in comparison with existing quantum algorithms. Consequently, they perform better in quantum computing and are more feasible to experimental realization with current technology. Obstacles still exist, though, in the efficient realization of the oracles to generate the components of the circulant matrices.

Besides the implementation of circulant matrices, we discover that we can perform the HHL algorithm on circulant matrices to implement the inverse of circulant matrices, by adopting the Taylor series approach to efficiently simulate circulant Hamiltonians. Owing to the special structure of circulant matrices, we prove that they are one of the types of the dense matrices that can be efficiently simulated. Being able to implement the inverse of circulant matrices opens a door to solving a variety of real-world problems, for example, solving cyclic systems in vibration analysis. Finally, we show that it is possible to construct oracles for products of circulant matrices using the oracles for their factor circulants, a technique that will be useful in related algorithms.

Data accessibility

This paper does not include further supporting information.

Authors' contributions

S.S.Z. devised the quantum algorithms described in this paper, carried out the detailed complexity analysis and drafted the manuscript. J.B.W. provided overall guidance and supervision, and helped in drafting and revising the manuscript. All authors gave final approval for publication.

Competing interests

The authors declare no competing interests.


The work is supported by the UWA-Nanjing exchange programme and The University of Western Australia.


We would like to thank Xiaosong Ma, Anuradha Mahasinghe, Jie Pan, Thomas Loke and Shengjun Wu for helpful discussions.


1 By controlled-Oc, we mean the operation |0〉〈0|⊗I+|1〉〈1|⊗Oc.

2 We use the symbol O~ to suppress polylogarithmic factors.

Published by the Royal Society under the terms of the Creative Commons Attribution License, which permits unrestricted use, provided the original author and source are credited.