key: cord-0061107-90px55pi
authors: Bonnetain, Xavier; Schrottenloher, André
title: Quantum Security Analysis of CSIDH
date: 2020-03-25
journal: Advances in Cryptology - EUROCRYPT 2020
DOI: 10.1007/978-3-030-45724-2_17
sha: f577347ba260a1ad3565291af459e811294ccafa
doc_id: 61107
cord_uid: 90px55pi

CSIDH is a recent proposal for post-quantum non-interactive key-exchange, based on supersingular elliptic curve isogenies. It is similar in design to a previous scheme by Couveignes, Rostovtsev and Stolbunov, but aims at an improved balance between efficiency and security. In the proposal, the authors suggest concrete parameters in order to meet some desired levels of quantum security. These parameters are based on the hardness of recovering a hidden isogeny between two elliptic curves, using a quantum subexponential algorithm of Childs, Jao and Soukharev. This algorithm combines two building blocks: first, a quantum algorithm for recovering a hidden shift in a commutative group. Second, a computation in superposition of all isogenies originating from a given curve, which the algorithm calls as a black box. In this paper, we give a comprehensive security analysis of CSIDH. Our first step is to revisit three quantum algorithms for the abelian hidden shift problem from the perspective of non-asymptotic cost, with trade-offs between their quantum and classical complexities. Second, we complete the non-asymptotic study of the black box in the hidden shift algorithm. We give a quantum procedure that evaluates CSIDH-512 using less than 40 000 logical qubits. This allows us to show that the parameters proposed by the authors of CSIDH do not meet their expected quantum security.

Problems such as factoring and solving discrete logarithms, believed to be classically intractable, underlie the security of most asymmetric cryptographic primitives in use today. After Shor found a quantum polynomial-time algorithm for both [44] , the cryptographic community has been actively working on replacements, culminating with the ongoing NIST call for post-quantum primitives [37] .

One of the families of problems studied concerns elliptic curve isogenies. In this setting, we consider a graph, whose vertices are elliptic curves, and whose edges are non constant morphisms (isogenies). The problem of finding a path between two given curves was first used in the design of the CGL hash functions [13] with supersingular isogeny graphs. Afterwards, a key-exchange based on ordinary curves (CRS) was proposed independently by Rostovtsev and Stolbunov [45] and Couveignes [18] . Later, a quantum algorithm was given in [16] , that could find an isogeny between two such curves in subexponential time, a problem for which classical algorithms still require exponential time. Although it is not broken in quantum polynomial time, the scheme became considered as too inefficient with respect to its post-quantum security.

Meanwhile, a key-exchange based on supersingular elliptic curves isogenies was proposed [21] , and the candidate SIKE was selected for the second round of the NIST standardization process. The quantum algorithm for finding ordinary isogenies cannot be applied for the supersingular graphs, and the best known quantum algorithm for breaking SIKE has an exponential time complexity.

CSIDH. CSIDH is a new primitive presented at ASIACRYPT 2018 [12] . Its name stands for "commutative supersingular isogeny Diffie-Hellman", and its goal is to make isogeny-based key exchange efficient in the commutative case, analogous to a regular non-interactive Diffie-Hellman key exchange. CSIDH uses supersingular elliptic curves defined over F p . In this case, the F p -isogeny graph has a structure analogous to the ordinary isogeny graph, and the subexponential quantum attack of [16] also applies. CSIDH aims at an improved balance between efficiency and security with respect to the original CRS scheme. However, it stands in a peculiar situation. To the best of our knowledge, it is the only postquantum scheme actively studied 1 against which a quantum adversary enjoys more than a polynomial speedup. Schemes based on lattices, codes, or SIKE, rely on problems with a quantum speedup quadratic at best.

In only two years, CSIDH has been the subject of many publications, showing a renewed interest for protocols based on commutative elliptic curve isogenies. It has been used in [20] to devise the signature scheme SeaSign. CSIDH and SeaSign were further studied and their efficiency was improved in [22, 26, 35, 36] , the last two works published at PQCRYPTO 2019.

Meanwhile, there has been a few works dealing with the security of CSIDH. The asymptotic cost of attacking the scheme, with classical precomputations and a quantum polynomial-space algorithm, was studied in [7] . Asymptotically also, it was shown in [27] that CSIDH (and CRS) could be attacked in polynomial space. Next, a quantum-classical trade-off using Regev's variant [39] of Kuperberg's sieve was proposed in [8] . Only two works studied the concrete parameters proposed in [12] : independently from us, Peikert [38] attacked CSIDH-512 using Kuperberg's collimation sieve [32] . Contrary to us, he uses classical memory with quantum random access. Finally, the number of Toffoli gates required to implement a CSIDH-512 key-exchange in constant time has been studied in full detail in [4] , published at EUROCRYPT 2019. However, the authors designed an irreversible classical circuit, and the memory usage of an immediate translation to a quantum circuit seems massive (see the appendix of [4] ).

In this paper, we make a decisive move towards understanding the quantum security of CSIDH. First, we revisit three quantum abelian hidden shift algorithms from the available literature, that can be used to recover the secret key in a CSIDH key-exchange, from the point of view of non-asymptotic cost. We give a wide range of trade-offs between their quantum and classical time and memory complexities. Second, we give quantum circuits for computing the isogenies in CSIDH. Building on [4] , with the addition of quantum timespace tradeoffs for reversible computations and refined quantum search, we give a quantum procedure that computes the action of the class group in CSIDH-512 using 2 49. 8 Toffoli gates and less than 40 000 qubits. Putting together our improved query complexities and this new quantum circuit, we are able to attack CSIDH-512, -1024 and -1792 in 2 10 to 2 48 less quantum time than expected, using only tens of thousands of logical qubits.

Paper Outline. Section 2 below presents the context of the CSIDH group action and outlines the attack. We next go into the details of the two building blocks: a quantum black-box hidden shift algorithm, and a quantum procedure to evaluate the class group action. In Sect. 3, we present the three main quantum algorithms for finding abelian hidden shifts. Our contribution here is to give nonasymptotic estimates of them, and to write a simple algorithm for cyclic hidden shift (Algorithm 2), which can be easily simulated. In Sect. 4, we show how to replace the class group action oracle by the CSIDH group action oracle using lattice reduction. We study the latter in Sect. 5. We summarize our complexity analysis in Sect. 6.

In this section, we present the rationale of CSIDH and the main ideas of its quantum attack. Throughout this paper, we use extensively standard notions of quantum computing such as qubits, ancilla qubits, quantum gates, entanglement, uncomputing, quantum Fourier Transform (QFT), CNOT and Toffoli gates. We use the Dirac notation of quantum states | . We analyze quantum algorithms in the quantum circuit model, where the number of qubits represents the quantum space used, including ancilla qubits which are restored to their initial state after the computation. Time is the number of quantum gates in the circuit (we do not consider the metric of circuit depth). We use the standard "Clifford+T" universal gate set for all our benchmarks [25] and focus notably on the T-gate count, as T-gates are usually considered an order of magnitude harder to realize than Clifford gates. It is possible to realize the Toffoli gate with 7 T-gates.

Let p > 3 be a prime number. In general, supersingular elliptic curves over F p are defined over a quadratic extension F p 2 . However, the case of supersingular curves defined over F p is special. When O is an order in an imaginary quadratic field, each supersingular elliptic curve defined over F p having O as its F p -rational endomorphism ring corresponds to an element of C (O), the ideal class group of O. Moreover, a rational -isogeny from such a curve corresponds to an ideal of norm in C (O). The (commutative) class group C (O) acts on the set of supersingular elliptic curves with F p -rational endomorphism ring O.

One-way Group Action. All use cases of the CSIDH scheme can be pinned down to the definition of a one-way group action (this is also the definition of a hard homogeneous space by Couveignes [18] ). A group G acts on a set X. Operations in G, and the action g * x for g ∈ G, x ∈ X, are easy to compute. Recovering g given x and x = g * x is hard. In the case of CSIDH, X is a set of Montgomery curves of the form E A :

Taking g * x for an element in C (O) (i.e. an isogeny) and a curve corresponds to computing the image curve of x by this isogeny. CSIDH and CRS both benefit from this action of the class group, which also exists in the ordinary case. Quantum algorithms for recovering abelian hidden shifts solve exactly this problem of finding g when G is commutative. There exists a family of such algorithms, initiated by Kuperberg. The variant of [16] targets precisely the context of ordinary curves, and it can be applied to CSIDH.

Representation of C (O). The designers choose a prime p of the form: p = 4 · 1 · · · u − 1 where 1 , . . . , u are small primes. This enables to represent the elements of C (O) (hence, the isogenies) in a way that is now specific to CSIDH, and the main reason of its efficiency. Indeed, since each of the i divides −p−1 = π 2 − 1, the ideal i O splits and l i = ( i , π − 1) is an ideal in O. The image curves by these ideals can be computed efficiently [12, Section 8] .

The designers consider the set

where [l i ] is the class of l i . If we suppose that these products fall randomly in C (O), which has O( √ p) elements, it suffices to take 2m + 1 p 1/(2u) in order to span the group C (O) or almost all of it. Since a greater m yields more isogeny computations, u should be the greatest possible. With this constraint in mind, we estimate u = 132 and u = 209 for CSIDH-1024 and CSIDH-1792 respectively (for CSIDH-512, we know that u = 74 and the list of primes is given in [12] ).

Given Classically, the best method seems the exhaustive key search of [b] using a meetin-the-middle approach: it costs O(p 1/4 ). Quantumly, they use the cost given in [16] for ordinary curves: exp (

Levels of Security. In [12] , the CSIDH parameters 512, 1024 and 1792 bits are conjectured secure up to the respective levels 1, 3 and 5 of the NIST call [37] . These levels correspond respectively to a key-recovery on AES-128, on AES-192 and AES-256. A cryptographic scheme, instantiated with some parameter size, matches level 1 if there is no quantum key-recovery running faster than quantum exhaustive search of the key for AES-128, and classical key-recovery running faster than classical exhaustive search. The NIST call considered the quantum gate counts given in [25] . These were improved later in [33] , and we choose to adopt these improvements in this paper. For example, AES-128 key-recovery can be done with Grover search using 1.47 · 2 81 T-gates and 865 qubits. Hence any algorithm using less than 1.47 · 2 81 T-gates and 2 128 classical computations breaks the NIST level 1 security, as it runs below the security level of AES-128.

Algorithm 1 outlines a quantum key-recovery on CSIDH. Given E A , E B , we find a vectorē such that

We will not retrieve the exact secret key which was selected at the beginning, but the outputē will have an L 1 norm small enough that it can be used instead, and impersonate effectively the secret key. In order to evaluate the cost of Algorithm 1, we need to study the quantum query complexity of the black-box hidden shift algorithm applied, but also its classical complexity, as it will often contain some quantum-classical trade-off. Afterwards, we need to analyze the quantum gate complexity of an oracle for the action of the ideal class group on Montgomery curves. There will also be classical precomputations.

In [16] , in the context of ordinary curves, the authors show how to evaluate [x]·E for any ideal class [x] in superposition, in subexponential time. For CSIDH, in a non-asymptotic setting, it is best to use the structure provided by the scheme (contrary to [7] ). We have supposed that the class group is spanned by products of the form [l 1 ] e1 . . . [l u ] eu with small e i . If we are able to rewrite any [x] as such a product, then the evaluation of the class group action [x] · E costs no more than the evaluation of the CSIDH group action i [l i ] ei · E. Here, a technique based on lattice reduction intervenes, following [6, 7, 18] .

In general, although the class group is spanned by the products used in the CSIDH key-exchange: {[l 1 ] e1 . . . [l u ] eu , −m ≤ e i ≤ m}, we cannot retrieve the shortest representation of a given [x] . There is some approximation overhead, related to the quality of the lattice precomputations. In Sect. 4, we will show that this overhead is minor for the CSIDH original parameters.

In this section, we present in detail three quantum algorithms for solving the hidden shift problem in commutative (abelian) groups. For each of them, we give tradeoff formulas and non-asymptotic estimates. The first one (Sect. 3.2) is a new variant of [31] for cyclic groups, whose behavior is easy to simulate. The second is by Regev [39] and Childs, Jao and Soukharev [16] . The third is Kuperberg's second algorithm [32] .

The hidden shift problem is defined as follows:

Problem 1 (Hidden shift problem). Let (G, +) be a group, f, g : G → G two permutations such that there exists s ∈ G such that, for all x, f (x) = g(x + s). Find s.

Classically, this problem essentially reduces to a collision search, but in the case of commutative groups, there exists quantum subexponential algorithms. The first result on this topic was an algorithm with low query complexity, by Ettinger and Høyer [24] , which needs O(log(N )) queries and O(N ) classical computations to solve the hidden shift in Z/N Z. The first time-efficient algorithms were proposed by Kuperberg in [31] . His Algorithm 3 is shown to have a complexity in quantum queries and memory of O 2 √ 2 log 2 (3) log 2 (N ) for the group Z/N Z for smooth N , and his Algorithm 2 is in O 2 3 √ log 2 (N ) , for any N . This has been followed by a memory-efficient variant by Regev, with a query complexity in L N (1/2, √ 2) and a polynomial memory complexity, in [39] , which has been generalized by Kuperberg in [32] , with an algorithm in O 2

quantum queries and classical memory, and a polynomial quantum memory. Regev's variant has been generalized to arbitrary commutative groups in the appendix of [16] , with the same complexity. A complexity analysis of this algorithm with tighter exponents can be found in [9] .

A broad presentation of subexponential-time quantum hidden shift algorithms can be found in [39] . Their common design is to start with a pool of labeled qubits, produced using quantum oracle queries for f and g. Each qubit contains information in the form of a phase shift between the states |0 and |1 . This phase shift depends on the (known) label and on the (unknown) hidden shift s. Then, they use a combination procedure that consumes labeled qubits and creates new ones. The goal is to make the label reach some wanted value (e.g. 2 n−1 ), at which point meaningful information on s (e.g. one bit) can be extracted.

Cyclic Groups and Concrete Estimates. In [10] , the authors showed that the polynomial factor in the O, for a variant of Kuperberg's original algorithm, is a constant around 1 if N is a power of 2. In the context of CSIDH, the cardinality of the class group C (O) is not a power of 2, but in most cases, its odd part is cyclic, as shown by the Cohen-Lenstra heuristics [17] . So we choose to approximate the class group as a cyclic group. This is why we propose in what follows a generalization of [10, Algorithm 2] that works for any N , at essentially the same cost. We suppose that an arbitrary representation of the class group is available; one could be obtained with the quantum polynomial-time algorithm of [14] , as done in [16] .

In this section, we present a generic hidden shift algorithm for Z/N Z, which allows us to have the concrete estimates we need. We suppose an access to the quantum oracle that maps |x |0 |0 to |x |0 |f (x) , and |x |1 |0 to |x |1 |g(x) .

Producing the Labeled Qubits. We begin by constructing the uniform superposition on N × {0, 1}: 1

. Then, we apply the quantum oracle, and get 1 √ 2N

We then measure the final register. We obtain a value y = f (x 0 ) = g(x 0 + s) for some random x 0 . The two first registers collapse on the superposition that corresponds to this measured value: 1 √ 2 (|x 0 |0 + |x 0 + s |1 ). Finally, we apply a Quantum Fourier Transform (QFT) on the first register and measure it, we obtain a label and the state

The phase χ s N , which depends on s and N , contains information on s. We now apply a combination routine on pairs of labeled qubits (|ψ , ) as follows.

Step. If we have obtained two qubits |ψ 1 and |ψ 2 with their corresponding labels 1 and 2 , we can write the (disentangled) joint state of |ψ 1 and |ψ 2 as:

We apply a CNOT gate, which maps |00 to |00 , |01 to |01 , |10 to |11 and |11 to |10 . We obtain the state:

We measure the second qubit. If we measure 0, the first qubit collapses to:

and if we measure 1, it collapses to:

A common phase factor has no incidence, so we can see that the combination either produces |ψ 1+ 2 or |ψ 1− 2 , with probability 1 2 . Furthermore, the measurement of the first qubit gives us which of the labels we have obtained. Although we cannot choose between the two cases, we can perform favorable combinations: we choose 1 and 2 such that 1 ± 2 is a multiple of 2 with greater valuation than 1 and 2 themselves.

Goal of the Combinations. In order to retrieve s, we want to produce the qubits with label 2 i and apply a Quantum Fourier Transform. Indeed, we have

The amplitude associated with t is 1

sin(πθ) . For θ ∈ 0; 1 2 n+1 , this value is decreasing, from 1 to

2 π . Hence, when measuring, we obtain a t such that s N + t 2 n ≤ 1 2 n+1 with probability greater than 4 π 2 . Such a t always exists, and uniquely defines s if n > log 2 (N ).

From 2 n to any N . We want to apply this simple algorithm to any cyclic group, with any N . A solution is to not take into account the modulus N in the combination of labels. We only want combinations such that k ± k = 2 i . At each combination step, we expect the 2-valuation of the output label to increase (we collide on the lowest significant bits), but its maximum size can also increase:

We note val 2 (x) = max i 2 i |x the 2-valuation of x. The procedure is Algorithm 2. Each label is associated to its corresponding qubit, and the operation ± corresponds to the combination.

Hidden shift algorithm for Z/N Z Input: N , a number of queries Q, a quantum oracle access to f and g such that f (x) = g(x + s), x ∈ Z/N Z Output: s 1: Generate Q random labels in [0; N ) using the quantum oracles 2: Separate them in pools Pi of elements e such that val2(x) = i 3: i ← 0, R = ∅, n ← log 2 (N ) . 4: while some elements remain do 5:

if i ≤ n then 6:

Pop a few elements e from Pi, put (e, i) in R.

end if 8:

for (e, j) ∈ R do 9:

if val2(e − 2 j ) = i then 10:

Pop a of Pi which maximizes val2(a + e − 2 j ) or val2(e − 2 j − a) 11:

e = e ± a 12:

Apply a QFT on the qubits, measure a t 16:

Insert c in the corresponding Pj 23: end while 24:

i ← i + 1 25: end while 26: return Failure Intuitively, the behavior of this algorithm will be close to the one of [10] , as we only have a slightly higher amplitude in the values, and a few more elements to produce. The number of oracle queries Q is exactly the number of labeled qubits used during the combination step. Empirically, we only need to put 3 elements at each step in R in order to have a good success probability. This algorithm is easily simulated, because we only need to reproduce the combination step, by generating at random the new labels obtained at each combination. We estimate the total number of queries to be around 12 × 2 1.8 √ n (Table 1) . For the CSIDH parameters of [4] , we have three group sizes (in bits): n = 256, 512 and 896 respectively. We obtain 2 33 , 2 45 and 2 58 oracle queries to build the labeled qubits, with 2 31 , 2 43 and 2 56 qubits to store in memory. A slight overhead in time stems from the probability of success of 4 π 2 ; the procedure needs to be repeated at most 4 times. In CSIDH, the oracle has a high gate complexity. The number of CNOT quantum gates applied during the combination step (roughly equal to the number of labeled qubits at the beginning) is negligible. Notice also that the production of the labeled qubits can be perfectly parallelized.

Algorithm 2 is only a variant of the first subexponential algorithm by Kuperberg in [31] . We develop here on a later approach used by Regev [39] and Childs, Jao and Soukharev [16] for odd N .

Routine. This algorithm uses the same labeled qubits as the previous one. The main idea is to combine not 2, but k qubits:

and apply |x |0 → |x | x·( 1 , . . . , k )/B for a given B that controls the cost of the combination routine and depends on the tradeoffs of the complete algorithm. Measuring the second register yields a value V = x · ( 1 , . . . , k )/B , the state becoming

In order to get a new labeled qubit, one can simply project on any pair (j 1 , j 2 ) with j 1 and j 2 among this superposition of j. This is easy to do as long as the j are classically known. They can be computed by solving the equation j · ( 1 , . . . , k )/B = V , which is an instance of the subset-sum problem. This labeled qubit obtained is of the form:

which, up to a common phase factor, is:

We observe that the new label in the phase, given by (j 2 − j 1 ) · ( 1 , . . . , k ), is less than B. If we map j 1 and j 2 respectively to 0 and 1, we obtain a labeled qubit |ψ with < B. Now we can iterate this routine in order to get smaller and smaller labels, until the label 1 is produced. If N is odd, one reaches the other powers of 2 by multiplying all the initial labels by 2 −a and then applying normally the algorithm.

Compute the corresponding j 5: Project to a pair (j1, j2).

The register is now

There are 2 k sums, and N/B possible values, hence we can expect to have 2 k B/N solutions. If we take k log 2 (N/B), we can expect 2 solutions on average. In order to obtain a labeled qubit in the end, we need at least two solutions, and we need to successfully project to a pair (j 1 , j 2 ) if there are more than two solutions.

The case where a single solution exists cannot happen more than half of the time, as there are twice many inputs as outputs. We consider the case where we have strictly more than one index j in the sum. If we have an even number of such indices, we simply divide the indices j into a set of pairs, project onto a pair, and map one of the remaining indexes to 0 and the other to 1. If we have an odd number of such indices, since it is greater or equal than 3, we single out a solitary element, and do the projections as in the even case. The probability to fall on this element is less than 1 t ≤ 1 3 if there are t solutions, hence the probability of success in this case is more than 2 3 . This combination routine can be used recursively to obtain the label we want.

Linear Number of Queries. Algorithm 3 can directly produce the label 1 if we choose k = log 2 (N ) and B = 2. In that case, we will either produce 1 or 0 with a uniform probability, as the input labels are uniformly distributed.

If the group has a component which is a small power of two, the previous routine can be used with B = 1 in order to force the odd cyclic component at zero. Then the algorithms of [10] can be used, with a negligible overhead.

Overall, the routine can generate the label 1 using log 2 (N ) queries with probability one half. This also requires to solve a subset-sum instance, which can be done in only O 2 0.291 log 2 (N ) classical time and memory [2] .

We need to obtain log 2 (N ) labels, and then we can apply the Quantum Fourier Transform as before, to recover s, with a success probability 4 π 2 . So we expect to reproduce this final step 3 times. The total number of queries will be 8 log 2 (N ) 2 , with a classical time and memory cost in O 2 0.291 log 2 (N ) .

We note that this variant is the most efficient in quantum resources, as we limit the quantum queries to a polynomial amount. The classical complexity remains exponential, but we replace the complexity of a collision search (with an exponent of 0.5) by that of the subset-sum problem (an exponent of 0.291). In the case N 2 256 (CSIDH-512), by taking into account the success probability of the final Quantum Fourier Transform, we obtain 2 19 quantum queries and 2 86 classical time and memory.

Time/Query Tradeoffs. There are many possible tradeoffs, as we can adjust the number of steps and their sizes. For example, we can proceed in two steps: the first step will produce labels smaller than √ N , and the second will use them to produce the label 1. The subset-sum part of each step, done classically, will cost O 2 0.291 log 2 (N )/2 time and memory, and it has to be repeated log(N ) 2 /4 times per label. Hence, the total cost in queries is in O(log(N ) 3 ), with a classical time and memory cost in O 2 0.291 log 2 (N )/2 .

For N 2 256 , we can use Algorithm 3 to obtain roughly 130 labels that are smaller than 2 128 , and then apply Algorithm 3 on them to obtain the label 1. We can estimate the cost to be roughly 2 24 quantum queries, 2 60 classical time and 2 45 memory.

This method generalizes to any number of steps. If we want a subexponential classical time, then the number of steps has to depend on N . Many tradeoffs are possible, depending on the resources of the quantum attacker (see [9] ).

This section revisits the algorithm from [32] and builds upon tradeoffs developed in [9] . We remark that the previous labeled qubits |ψ were a particular case of qubit registers of the form

These multi-labeled qubit registers become the new building blocks. They are not indexed by a label , but by a vector ( 0 , . . . , k−1 ). We can remark that if we consider the joint state (tensor) of j single-label qubits |ψ i , we directly obtain a multi-labeled qubit register of this form:

Add an ancilla register, apply |i |j |0 → |i |j | ( i + j )/2 a−r 3: Measure the ancilla register, leaving with V and i,j: 

These registers can again be combined by computing and measuring a partial sum, as in This routine can also be generalized to merge more than two lists. The only difference will be that at Step 4, we will need to apply another list-merging algorithm to find all the matching values. In particular, if we merge 4k lists, we can use the Schroeppel-Shamir algorithm [43] , to obtain the solutions in O(M 2k ) classical time and O(M k ) classical memory.

Once we are finished, we project the vector to a pair of values with difference 1, as in Algorithm 3, with the same success probability, better than 1/3.

Complete Algorithm. The complete algorithm uses Algorithm 4 recursively. As before, the final cost depends on the size of the lists, the number of steps and the number of lists we merge at each step. Then, we can see the algorithm as a merging tree.

The most time-efficient algorithms use 2-list merging. The merging tree is binary, the number of lists at each level is halved. We can save some time if we allow the lists to double in size after a merging step. In that case, the merging of two lists of size 2 m to one list of size 2 m+1 allows to constrain m − 1 bits 2 , at a cost of O(2 m ) in classical and quantum time and classical memory. If we have e levels in the tree and begin with lists of size 2 0 , then the quantum query cost is 0 2 e . The time cost will be in O 2 0+e , as the first step is performed 2 e times, the second 2 e−1 times, and so on.

Allowing the lists to grow saves some time, but costs more memory. To save memory, we can also combine lists and force the output lists to be of roughly the same size. Hence, the optimal algorithm will double the list sizes in the first levels until the maximal memory is reached, when the list size has to stay fixed.

Overall, let us omit polynomial factors and denote the classical and quantum time as 2 t . We use at most 2 m memory and make 2 q quantum queries, begin with lists of size 2 0 and double the list sizes until we reach 2 m . Hence, the list size levels are distributed as in Fig. 1 . We have q equal to the number of levels, and t equals the number of levels plus 0 . As each level constrains as many bits as the log of its list size, the total amount of bits constrained by the algorithm corresponds to the hatched area. Hence, with max(m, q) ≤ t ≤ m + q, we can solve the hidden shift problem

We directly obtain the cost of O 2 √ 2n from [32] if we consider t = m = q.

Classical/Quantum Tradeoffs. The previous approach had the inconvenience of using equal classical and quantum times, up to polynomial factors. In practice, we can expect to be allowed more classical operations than quantum gates. We can obtain different tradeoffs by reusing the previous 2-list merging tree, and seeing it as a 2 k -list merging tree. That is, we see k levels as one, and merge the 2 k lists at once. This allows to use the Schroeppel-Shamir algorithm for merging, with a classical time in 2 2 k /2 and a classical memory in 2 2 k /4 . This operation is purely classical, as we are computing lists of labels, and it does not impact the quantum cost. Moreover, while we used to have a constraint on log(k)m bits, we now have a constraint on (k − 1)m bits. For k = 2, omitting polynomial factors, with a classical time of 2 2t and quantum time of 2 t , a memory of 2 m , a number of quantum queries of 2 q and max(m, q) ≤ t ≤ m + q, we can solve the hidden shift problem for N < 2 n with

In particular, if we consider that t = m = q, we obtain an algorithm with a quantum time and query and classical memory complexity of O(2 2 √ n 3 ) and a classical time complexity of O(2 4 √ n 3 ), and if we consider that t = 2m = 2q, we obtain a quantum query and classical memory cost in O(2 Concrete Estimates. If we consider N 2 256 , with the 2-list merging method we can succeed with 2 23 initial lists of size 2. We double the size of the list at each level until we obtain a list of size 2 24 . In that case, we obtain classical and quantum time cost in 2 39 , a classical memory in 2 29 and 2 34 quantum queries.

Using the 4-list merging, we can achieve the same in 10 steps with roughly 2 55 classical time, 2 23 classical memory, 2 35 quantum time, 2 31 quantum queries.

Other tradeoffs are also possible. We can reduce the number of queries by beginning with larger lists. We can also combine the k-list approach with the subset-sum approach to reduce the quantum time (or the classical memory, if we use a low-memory subset-sum algorithm).

For example, if we consider a 4-level tree, with a 4-list merging, an initial list size of 2 24 and lists that quadruple in size, the first combination step can constrain 24×3−2 = 70 bits, the second 26×3−2 = 76 and the last 28×4−1 = 111 bits (for the last step, we do not need to end with a large list, but only with an interesting element, hence we can constrain more). We bound the success probability by the success probability of one complete merging (greater than 1/3) times the success probability of the Quantum Fourier Transform (greater than π 2 /4), for a total probability greater than 1/8.

The cost in memory is of 2 30 , as we store at most 4 lists of size 2 28 . For the number of quantum queries: there are 4 3 = 64 initial lists in the tree, each costs 24 queries (to obtain a list of 2 24 labels by combining). We have to redo this 256 times to obtain all the labels we want, and to repeat this 8 times due to the probability of success. Hence, the query cost is 24 × 64 × 256 × 8 2 22 . The classical time cost is in 256 × 8 × 3 × 2 28×2 2 69 . The quantum time cost is in 256

We summarize the results of this section in Table 2 . 

This section reviews the lattice reduction technique that allows to go from an arbitrary representation of an ideal class [x] to a representation on a basis of arbitrary ideals: [x] = [l i ] xi , with short exponents x i . This allows to turn an oracle for the CSIDH group action, computing i [l i ] ei · E, into an oracle for the action of C (O).

Given Lattice Reduction. Next, we compute an approximate short basis B and its Gram-Schmidt orthogonalization B * . All this information about L will be stored classically. We compute B using the best known algorithm to date, the Block Korkine Zolotarev algorithm (BKZ) [42] . Its complexity depends on the dimension u and the block size, an additional parameter which determines the quality of the basis. For any dimension u, BKZ gives an approximation factor c u for some constant c depending on the block size: ||b 1 || 2 ≤ c u λ 1 (L) where λ 1 (L) is the euclidean norm of the smallest vector in L. In our case, assuming that the products [l i ] ei with −m ≤ e i ≤ m span the whole class group, one of these falls on 1 and we have:

In this section, we suppose that a product i [l i ] ti for some large t i is given Babai's Algorithm. The computation of a short basis B of L has to be done only once, but the approximate CVP needs to be solved on the fly and for a targett in superposition. As in [7] , we use a simple polynomial-time algorithm, relying on the quality of the basis B: Babai's nearest-plane algorithm [1] . We detail it in the full version of the paper [11] . Given the target vectort, B and its Gram-Schmidt orthogonalization B , this algorithm outputs in polynomial time a vectorv in the lattice L such that ||v −t|| 2 

. This bound holds simultaneously for every target vectort and corresponding outputv (ast will actually be a superposition over all targets, this is important for us).

Effect on the L 1 Norm. Our primary concern is the number of isogenies that we compute, so we will measure the quality of our approximation with the L 1 norm of the obtainedē =v −t. The bound on the L 1 norm is: ||v −t|| 1 

. Naturally, if we manage to solve the exact CVP, and obtain always the closest vector tot, any evaluation of [x] · E A will cost exactly the same as an evaluation of i [l i ] ei · E A with the bounds on the exponents e i specified by the CSIDH parameters; hence the class group action collapses to the CSIDH group action.

Our Simulations. We performed simulations by modeling C (O) as a cyclic group of random cardinality q √ p. Then we take u elements at random in this group, of the form g ai for some generator g and compute two-by-two relations between them, as: (g ai ) ai+1 · (g ai+1 ) −ai = 1. With such a basis, the computational system Sage [46] performs BKZ reduction with block size 50 in a handful of minutes, even in dimension 200. We compute the L 1 bound

2 for many lattices generated as above, reduced with BKZ-50. We obtain on average, for CSIDH -512, -1024 and -1792 (of dimensions 74, 132 and 209 respectively), 1300, 4000 and 10000. The standard deviation of the values found does not exceed 10%. Notice that the bound is a property of the lattice, so we can take the average here, even though we will apply Babai's algorithm to a superposition of inputs.

Faster Evaluations of the Class Group Action. In the context of speeding up the classical group action, the authors of [5] computed the structure of the class group for CSIDH-512, the relation lattice and a small basis of it. They showed that the class group was cyclic. Given an ideal class [x], they use Babai's algorithm with another refinement [23] . It consists in keeping a list of short vectors and adding them to the output of Babai's algorithm, trying to reduce further the L 1 norm of the result.

In particular for CSIDH-512, they are able to compute vectors of L 1 norm even shorter on average than the original bound of 5 × 74 = 370, reaching an average 240 with BKZ-40 reduction. This suggests that, with lattice reduction, there may be actually less isogenies to compute than in the original CSIDH group action. However, we need a bound guaranteed for all target vectors, since we are computing in superposition, which is why we keep the bounds of above.

In this section, we first analyze the cost of a quantum circuit that evaluates the CSIDH group action on a given Montgomery curve E A represented by A ∈ F p :

where L i corresponds to applying [l i ] to a given curve, and the e i are possibly greater than the CSIDH original exponents. We will then move to the class group action, which computes [x] · E A in superposition for any [x] . Following previous literature on the topic [4, 41] , we count the number of Toffoli gates and logical qubits used, as both are considered the most determinant factors for implementations. Our goal is to give an upper bound of resources for CSIDH-512 and an estimate for any CSIDH parameters, given a prime p of n bits and the sequence of small primes i such that p = 4 · i i − 1.

It was shown in [27] that the group action could be computed in polynomial quantum space. A non-asymptotic study of the gate cost has been done in [4] . However, the authors of [4] were concerned with optimizing a classical circuit for CSIDH, without reversibility in mind. This is why the appendix of [4] , mentions a bewildering amount of "537503414" logical qubits [4, Appendix C.6] (approx. 2 29 ). In this section, we will show that the CSIDH-512 group action can be squeezed into 40 000 logical qubits.

We adopt a bottom-up approach. We first introduce some significant tools and components, then show how to find, on an input curve E A , a point that generates a subgroup of order . We give a circuit for computing an isogeny, a sequence of isogenies, and combine this with lattice reduction to compute the class group action.

Bennett's Conversion. One of the most versatile tools for converting an irreversible computation into a reversible one is Bennett's time-space tradeoff [3] . Precise evaluations were done in [30, 34] .

Assume that we want to compute, on an input x of n bits, a sequence f t−1 • . . . • f 0 (x), where each f i can be computed out of place with a quantum circuit using T f Toffoli gates and a ancilla qubits: |x |0 → |x |f i (x) . We could naturally compute the whole sequence using tn ancilla qubits, but this rapidly becomes enormous. Bennett remarks that we can separate the sequence f t−1 • . . . • f 0 = G • F , with F and G functions using m F and m G ancillas respectively, and compute:

If T F and T G are the respective Toffoli counts of the circuits for F and G, the total is 2T F + T G and the number of ancillas used is max(m F , m G ) + n. Afterwards, we cut F and G recursively. Bennett obtains that for any > 0, an irreversible circuit using S space and running in time T can be converted to a reversible circuit running in time T 1+ and using O(S log T ) space.

Step. It often happens for us that the final result of the f isequence is actually not needed, we need only to modify the value of another one-bit register depending on f t−1 • . . . • f 0 (x) (for example, flipping the phase). This means that at the highest level of the conversion, all functions are actually uncomputed. This can also mean that we do not compute

for some new f . Hence the cost is the same as if we added one more step before the conversion, and often negligible.

Number of Steps Given a Memory Bound. We want to be as precise as possible, so we follow [30] . In general, we are free to cut the t operations in any way, and finding the best recursive way, given a certain ancilla budget, is an optimization problem. Let B(t, s) be the least number of computation steps, for a total Toffoli cost B(t, s)T f , given sn + m available ancilla qubits, to obtain reversibly f t−1 • . . . • f 0 (x) from input x. We have:

Theorem 1 (Adaptation of [30] , Theorem 2.1). B(t, s) satisfies the recursion:

In all the costs formulas that we write below, we add a trade-off parameter s in the memory used and B(t, s) in the time.

Basic Arithmetic Modulo p. The Toffoli cost of the group action oracle is almost totally consumed by arithmetic operations modulo p (a prime of n bits), and in the following, we count the time in multiples of these basic operations. We do not make a difference between multiplication and squaring, as we use a single circuit for both, and denote T M the Toffoli gate count of a multiplication in F p , using Q M ancilla qubits. We also denote T I the Toffoli count of an inversion and Q I its ancilla count. As n will remain the same parameter throughout this section, we deliberately omit it in these notations, although T M , T I , Q I , Q M are functions of n. Note that [4] considers that the inversion modulo p costs an n-bit exponentiation, far more than with the circuit of [41] . Table 1 ). There is a quantum circuit for (out of place) inversion modulo a prime p of n bits: |x |0 → |x |x −1 mod p that uses T I = 32n 2 log 2 n Toffoli gates and Q I = 5n + 2 log 2 n + 7 qubits.

This circuit is out of place: the input registers are left unchanged, and the result is written on an n-bit output register. Circuits for in-place modular addition and doubling are also given in [41] and their Toffoli counts remain in O (n log 2 n), hence negligible with respect to the multiplications.

We use the best modular multipliers given in [40] with 3n qubits and 4n 2 Toffoli gates (dismissing terms of lower order). Note that, although the paper is focused on in-place multiplication by a classically known Y (i.e. computing |x → |xY ), the same resource estimations apply to the out-of-place multiplication of two quantum registers: |x |y |0 → |x |y |xy (see [40, Section 2.5] ). Implementing a controlled multiplication (an additional register chooses to apply it or not) is not much more difficult than a multiplication.

In-place Multiplication. The in-place multiplication: |x |y → |x |x · y is not reversible if x is not invertible, and in this case, we can simply rewrite |y in the output register. We reuse the modular inversion circuit of [41] to compute |x −1 . Then we compute |x · y and erase the |y register by computing |x · y · x −1 .

There is a circuit that on input |x |y returns |x |x · y if x is invertible and |x |y otherwise. It uses T M = 2T M + 2T I Toffoli gates and Q M = Q I + n ancillas.

We give a circuit that maps |m |x |0 to |m |x |x m . Contrary to the modular exponentiation in Shor's algorithm, in our case, both x and m are quantum, which means that we cannot classically precompute powers of x (see e.g. [41] ).

We use a simple square-and-multiply approach with Bennett's time-space tradeoff. We perform t steps requiring each a squaring and a controlled multiplication by x: on input |y |0 |0 , we compute |y |x · y |0 then |y |x · y |0 , then |y |x·y |(x·y) 2 and erase the second register with another multiplication. Hence a single step uses 3T M Toffolis and Q M + n ancillas. 

Most of the work in the group action oracle is spent computing the (x-coordinate of the) m-th multiple of a point P on a Montgomery elliptic curve given by its coefficient A, for a quantum input m. Following the presentation in [4, Section 3.3], made reversible and combined with Bennett's time-space tradeoff, we prove Lemma 4 in the full version of the paper [11] . Notice that mP can be transformed back to affine coordinates with little overhead, since the inversion in F p costs T I = O n 2 log n Toffolis. t bits) , the x-coordinate of mP (in projective coordinates), using 15B(t, s)T M Toffolis and Q M + 2n + 4sn ancilla qubits, where s is a trade-off parameter.

Given A in input, we want to compute B = L (A), the coefficient of the curve -isogenous to A. This requires to find a subgroup of order of the curve E A . In CSIDH, this is done by first finding a point P on E A , then computing Q = ((p + 1)/ )P . if Q is not the point at infinity, it generates a subgroup of order .

x ∈ F * p , returns 1 if x is the x-coordinate x of such a good point P , and 0 otherwise. We will first build a quantum circuit that on input A and x ∈ F * p , flips the phase: |A |x → (−1) test(x) |A |x . We will use this circuit as a test in a modified Grover search.

Testing if P is on the Curve. We compute x 3 +Ax 2 +x using some multiplications and squarings (a negligible amount), then the Legendre symbol of x 3 + Ax 2 + x. For exactly half of F * p , we obtain 1, which means that x is the x-coordinate of a point on the curve. For the other half, we obtain −1, and x is actually the x-coordinate of a point on its twist.

Multiplication by the Cofactor. Assume that the x-coordinate obtained above is that of a point P on the curve. We compute Q = ((p + 1)/ )P using our reversible Montgomery ladder. Then, another failure occurs if Q = ∞. This happens with probability 1/ . Hence, the probability of success of the samplingand-multiplication operation is 1 2 1 − 1 . In the circuit that we are building right now, we don't need the value of Q, only the information whether Q = ∞ or not. Bennett's conversions of both the Legendre symbol computation and the Montgomery ladder can take into account the fact that we merely need to flip the phase of the input vector. With this phase-flip oracle, we can obtain a point of order with a quantum search. Instead of using Elligator as proposed in [4] , we follow the "conventional" approach outlined in [4, Section 4.1], not only because it is simpler, but also because its probability of success is exactly known, which makes the search operator cheaper. More details are given in the full version of the paper [11] .

Quantum Search with High Success Probability. We start by generating the uniform superposition x∈F * p |x using a Quantum Fourier Transform (this is very efficient with respect to arithmetical operations). We use a variant of amplitude amplification for the case where the probability of success is high [15] . This variant is exact, but requires to use a phase shift whose angle depends on the success probability.

We know that the proportion of good x is exactly g = 1 2 1 − 1 . Normally, a Grover search iteration contains a phase flip and a diffusion transform which, altogether, realize an "inversion about average" of the amplitudes of the vectors in the basis. In [15] , this iteration is modified into a controlled-phase operator which multiplies the phase of "good vectors" by e iγ instead of −1 and a "βphase diffusion transform". Then by [15, Theorem 1] , if 1 4 ≤ g ≤ 1 and we set β = γ = arccos(1 − 1/(2g)), the amplitude of the "bad" subspace is reduced to zero. Such a phase shift can be efficiently approximated with the Solovay-Kitaev algorithm [19] . For a phase shift gate synthesized from Clifford+T gates, we estimate from [29] that it can be approximated up to an error of 2 −50 using around 2 14 T-gates, which is negligible compared to the cost of the exponentiation in the test function.

Detecting the Errors. If the error probability is low enough, we can assume that the end state is perfect. However, we can avoid these errors if, after computing the superposition of good points, we reapply the test function, add the result in an ancilla qubit and measure this qubit. In general, such a measurement could disrupt the computation. This is not the case here: measuring whether x is a good point for A, while A is in superposition, does not affect the register A, as the set of good points is always of the same size. With probability ≥ 1 − 2 −50 we measure 1 and the state collapses to the exact superposition of good points for the given A. Otherwise we stop the procedure here. When we need to uncompute this procedure, we revert the same single-iteration quantum search and perform the same measurement, with the same success probability.

There exists a quantum procedure that, on input (affine) A, finds the x-coordinate x of a "good" point on E A : |A |0 → |A ( x |x ). It uses 30B(n, s)T M + 6B(n, 4s)T M Toffolis and Q M + 2n + 4sn ancillas, and its probability of failure is less than 2 −50 .

Proof. This procedure runs as follows (we say "procedure" instead of "circuit", since it contains a measurement):

• Compute the superposition of points S = x∈F * p |x ;

• Apply the modified Grover operator: it contains the computation of S (negligible) and the computation of |x → e iγ test(x) |x • We actually do not obtain a single x, but a superposition close to the superposition of suitable x • Recompute the test in a single-bit ancilla register: |x |0 → |x |test(x) • Measure the ancilla register, forcing a collapse on the exact superposition of suitable x.

We set s = 4s in Lemma 5. All in all, we use two Legendre symbol computations and two n-bit reversible Montgomery ladders.

From the x-coordinate of a point Q on E A of order , we can compute the coefficient B of the -isogenous curve E B . The details are in the full version of the paper [11] .

There is a circuit that on input |A |Q |0 , computes |A |Q |B using Q I + (4s + 9)n ancilla qubits and We now put together the last subsections in order to perform an -isogeny mapping: |A |0 → |A |L (A) with overwhelming probability of success and detectable failure. We suppose that the cofactor (p + 1)/ has been classically precomputed. The isogeny computation is performed as follows:

1. On input |A , produce the superposition of good points P , that are on E A and have order p + 1 (detectable failures happen here) 2. On input |A |P , compute a reversible Montgomery ladder to obtain Q = ((p + 1)/ )P 3. On input |A |Q , obtain the coefficient B = L (A) of the image curve 4. Uncompute the Montgomery ladder for Q 5. Uncompute the superposition of good points (detectable failures happen here)

The ancilla cost of an out of place isogeny computation is the maximum between n + Q M + 2n + 4sn (computing the good points and the ladder for Q = ((p + 1)/ )P ) and n + Q I + (4s + 9)n (computing the image curve). We set s = s in order to use Q I + (4s + 11)n ancillas at most. Next, we denote T (s) the Toffoli count of this operation. It sums 60B(n, s) + 12B(n, 4s)T M (computing the good points), the cost of Lemma 7 and 30B(n, s)T M (computing the ladder).

Computing the inverse of an isogeny is not difficult, as noticed in [4] , as we have L −1 (A) = −L (−A). Hence, by doubling the cost, we are able to compute isogenies in place. On input |A , we compute |A |L (A) , then we compute L −1 to erase |A . We will see that most of the computation is spent computing the 12 reversible Montgomery ladders P → ((p + 1)/ )P .

There exists a quantum procedure that performs an -isogeny mapping in place: |A → |L (A) with an overwhelming probability of success (≤ 2 −50 ) and detectable failure using 2T (s) Toffolis and Q I +(4s+11)n ancillas.

Using the computation in place of L i , we now compute the image of an input A by a sequence of isogenies, described byē = e 1 , . . . e u :

. If we need to apply the backwards and not the forwards isogeny (e i is negative), we apply L −1 i (A) = −L i (−A), so we just need to change the signs of the registers, in place, with negligible overheads (in computations and qubits). In general, contrary to the standard CSIDH key-exchange, we do not have a guarantee on max i e i . Instead, we only know that ē 1 = i |e i | ≤ M for some bound M . We follow the idea of [4] of having a single quantum circuit for any i -isogeny computation, controlled by which isogeny we want to apply. Given an input vector e 1 , . . . e u , we apply isogenies one by one by decrementing always the top nonzero exponent (or incrementing it, if it is negative).

Since the procedure for the isogeny sequence considers all cases in superposition, it will always apply exactly M controlled isogenies, depending only on the promised bound M . Contrary to modular exponentiation, we don't need a timespace tradeoff for this sequence of computations, as isogenies can be computed in place (Lemma 8). Finally, if single isogenies fail with probability f , the total failure probability is lower than Mf.

A Constant Success Probability is Enough. The success probability 2 −50 given Lemma 8 is actually more than enough. Indeed, failures are detected and failed oracle queries can be discarded. One should note that the quantum hidden shift algorithms that apply to the cryptanalysis of CSIDH precisely allow this, since they start by applying the oracle many independent times before combining the results. Before the combination step, we can discard all the failed queries and the complexity is only multiplied by 1/ (1 − (Mf) ). Hence, compared to [4] , we do not only obtain a better success probability in a simpler way using quantum search, but we also reduce considerably the required success rate. In our case, we expect M ≪ 2 50 , a negligible failure probability, hence a negligible overhead.

Finally, we can transform the CSIDH group action into the class group action, using the lattice reduction technique of Sect. 4. We show in the full version of the paper [11] that, using [27] and Babai's algorithm together, we can achieve a negligible computational and memory overhead.

Let M be the L 1 bound obtained by reducing the lattice of relations. Assume that M ≪ 2 50 and is the maximal small prime used. Then there exists a quantum circuit for the class group action using 2MT (s) Toffolis and Q I + (4s + 11)n ancillas, where s is an integer trade-off parameter, with negligible probability of failure.

In this section, we assess the quantum security of the original parameters proposed in [12] . We count the number of T-gates necessary to attack CSIDH and compare to the targeted security levels.

In CSIDH-512, the base prime p has n = 511 bits, and there are u = 74 small primes whose maximum is = 587. We will first count the number of Toffoli gates required in terms of T M and T I , before plugging the cost of a reversible multiplication modulo p.

In Sect. 4, we have estimated that Babai's algorithm would return a vector of L 1 norm smaller than 1300. Hence, the oracle of Lemma 9 needs to apply M = 1300 in-place isogenies, more than the 74 · 5 = 370 required by the "legitimate" group action. We choose s = 15 in Lemma 9. Using Lemma 1, we compute B(512, 15) = 3553 and B(512, 60) = 1925. We further have log 2 = 10 and B(10, 60) = 17, ( + 1)/2 = 294 and B(294, 15) = 1809. For a single in-place isogeny, the number of multiplications is: 639540 = 2 19 Time Complexity for CSIDH-1024 and CSIDH-1792. Since the time is dominated by the Montgomery ladders, and Q I 5n, we simplify the Toffoli cost of an isogeny into 180B(n, s)T M and the ancilla cost into (4s + 16)n. We compute B(n, s) for various values of s and propose the trade-offs of Table 3 . 

The parameters in [12] are aimed at three security levels defined by the NIST call [37] : NIST 1 should be as computationally hard as recovering the secret key of AES-128 (with quantum or classical resources), NIST 3 should be as hard as key-recovery of AES-192 and NIST 5 key-recovery of AES-256. The NIST call referred to quantum estimates of [25] , but they have been improved in [33] . We plug our class group action oracle into the three quantum hidden shift algorithms of Sects. 3.2, 3.3 and 3.4, and compute the resulting complexities (note that, in terms of quantum time, we compare only the T-gate counts). The results are summarized in Table 4 . The first generic hidden-shift algorithm that we presented (Sect. 3.2) uses a large amount of quantum memory (resp. 2 31 , 2 43 and 2 56 qubits), as it needs to store all of its labeled qubits. Besides, as the quantum queries are very costly in the case of CSIDH, it is advantageous to reduce their count, even by increasing the classical complexity.

With the variant of Sect. 3.3, we see that the quantum query complexity decreases dramatically. If N is the cardinality of the class group (roughly √ p), we solve 8(log 2 N ) classical subset-sum instances on log 2 N bits (one for each label produced before the final QFT, and a success probability of 1 8 in total), each of which costs 2 0.291 log 2 N . 3 We make a total 8(log 2 N ) 2 quantum oracle queries. The quantum memory used depends only on the quantum oracle implementation. Table 4 . Attack trade-offs, in log 2 scale, rounded to the first decimal. "<" in the quantum memory complexity means that the memory comes mainly from the oracle. We put in bold the most significant trade-offs that we obtained for each variant. Going further, we can trade between classical and quantum cost with the algorithm of Sect. 3.4. We use 4-list merging, equal quantum query and classical memory costs (excluding polynomial factors). Hence we consider lists of size 2 √ 2 log 2 (N )/3 everywhere and log 2 (N )/6 steps, obtaining the costs of Table 4 with respectively 2 18 , 2 25 and 2 31 classical memory.

All the parameter sizes proposed in [12] fall below their targeted security levels. In Table 4 , we see that the best strategy to apply varies with the size of the parameter p. With the small instance CSIDH-512, it is better to reduce at most the number of quantum queries, even if it means increasing the classical time complexity. With CSIDH-1792, the variant of Sect. 3.3 with a polynomial number of quantum queries cannot be applied anymore, due to a too high classical complexity. However, the trade-off that we propose with Kuperberg's second algorithm (Sect. 3.4) allows to attack CSIDH-1024 and CSIDH-1792 with a significant quantum advantage. In order to meet the NIST security levels, the bit-size of the parameter p needs to be increased.

For CSIDH-512, it seems unlikely to us that the query count of 2 19 may be significantly decreased; however, there is room for improvement in the quantum oracle. Currently, our oracle performs 1300 in-place isogeny computations, each of which requires 12 Montgomery ladders with 512 steps. With more precise estimations, and improving our current use of Babai's algorithm, one might reduce the number of isogenies down to ∼240 [5] . But this would require to implement the algorithm of [23] as a quantum circuit and requires further investigation. We use currently 40 000 logical qubits; this could be reduced with more aggressive optimizations (for example, using dirty ancillas that don't need to start in the state |0 ). We also notice that in general, quantum multiplication circuits are optimized in order to use few ancilla qubits, with Shor's algorithm in mind. In the case of CSIDH, the prime p is smaller than an RSA modulus, but the number of ancillas can be higher, and different trade-offs might be used.

We performed the first non-asymptotic quantum security assessment of CSIDH, a recent and promising key-exchange primitive based on supersingular elliptic curve isogenies. We presented the main variants of quantum commutative hidden shift algorithms, which are used as a building block in attacking CSIDH. There are many tradeoffs in quantum hidden shift algorithms. This makes the security analysis of CSIDH all the more challenging, and we tried to be as exhaustive as possible regarding the current literature.

We gave tradeoffs, estimates and experimental simulations of their complexities. Next, we gave a quantum procedure for the class group action oracle in CSIDH, completing and extending the previous literature. Consequently, we were able to propose the first non-asymptotic cost estimates of attacking CSIDH.

Comparing these to the targeted security levels, as defined in the ongoing NIST call, we showed that the parameters proposed [12] did not meet these levels. We used different trade-offs between classical and quantum computations depending on the parameters targeted. In particular, the CSIDH-512 proposal is at least 1 000 times easier to break quantumly than AES-128, using a variant polynomial in quantum queries and exponential in classical computations.

Safe Instances. The minimal size for which the attacks presented here are out of reach is highly dependent both on the way we estimate the costs (as they are subexponential) and the interpretation of the NIST metrics. In particular, does NIST 1 allows for a classical part with Time = Memory = 2 128 , or only Time × Memory = 2 128 ? Moreover, the oracle cost vastly depends on the amount of qubits used inside.

We can propose two sets of parameters for security level NIST 1: one aggressive, and one conservative. If we consider that NIST 1 allows for a classical time-memory product of 2 128 , 2 20 quantum queries and we neglect the polynomial factors, then the minimal size would be p ∼ 2260 bits, which corresponds to a multiplication by 4 of the parameter size. Our best attack would use Kuperberg's second algorithm and 2-list merging, at a cost of 2 69 classical time, 2 59 classical memory, 2 20 quantum queries and 2 18 qubits.

For a more conservative estimation, we can consider that classical time can reach 2 128 and classical memory 2 64 , that the quantum oracle for CISDH can be reduced down to 2 40 T-gates, that a quantum key recovery on AES-128 costs 2 80 T-gates (which allows for 2 40 queries and 2 80 quantum time), and neglect polynomial factors. Then this would require p ∼ 5280 bits, that is, multiplying by 10 the parameter size. Our best attack uses 4-list merging in Kuperberg's second algorithm, for a cost in classical time of 2 128 , 2 64 classical memory, 2 40 quantum queries, and as many qubits as required by the hypothetical improved CSIDH oracle.

On Lovász' lattice reduction and the nearest lattice point problem

Improved generic algorithms for hard knapsacks

Time/space trade-offs for reversible computation

Quantum circuits for the CSIDH: optimizing quantum evaluation of isogenies

CSI-FISh: efficient isogeny based signatures through class group computations. IACR Cryptology ePrint Archive

Fast heuristic algorithms for computing relations in the class group of a quadratic order, with applications to isogeny evaluation

A note on the security of CSIDH

A tradeoff between classical and quantum circuit size for an attack against CSIDH

Improved low-qubit hidden shift algorithms

Hidden shift quantum cryptanalysis and implications

Quantum security analysis of CSIDH

CSIDH: an efficient post-quantum commutative group action

Cryptographic hash functions from expander graphs

Decomposing finite Abelian groups

Quantum database search by a single query

Constructing elliptic curve isogenies in quantum subexponential time

Heuristics on class groups of number fields

Hard homogeneous spaces

SeaSign: compact isogeny signatures from class group actions

Towards quantum-resistant cryptosystems from supersingular elliptic curve isogenies

Faster SeaSign signatures through improved rejection sampling

Finding closest lattice vectors using approximate Voronoi cells

On quantum algorithms for noncommutative hidden subgroups

Applying Grover's algorithm to AES: quantum resource estimates

Towards optimized and constant-time CSIDH on embedded devices

A subexponential-time, polynomial quantum space algorithm for inverting the CM group action

Quantum measurements and the Abelian stabilizer problem

Fast and efficient exact synthesis of singlequbit unitaries generated by Clifford and T gates

An analysis of Bennett's pebble game

A subexponential-time quantum algorithm for the dihedral hidden subgroup problem

Another subexponential-time quantum algorithm for the dihedral hidden subgroup problem

Reducing the cost of implementing AES as a quantum circuit

A note on Bennett's time-space tradeoff for reversible computation

On Lions and Elligators: an efficient constanttime implementation of CSIDH. Cryptology ePrint Archive

A faster way to the CSIDH

NIST: Submission requirements and evaluation criteria for the post-quantum cryptography standardization process

He gives C-Sieves on the CSIDH. IACR Cryptology ePrint Archive

A subexponential time algorithm for the dihedral hidden subgroup problem with polynomial space

High performance quantum modular multipliers

Quantum resource estimates for computing elliptic curve discrete logarithms

Lattice basis reduction: improved practical algorithms and solving subset sum problems

A T = O(2 n/2 ), S = O(2 n/4 ) algorithm for certain NP-complete problems

Algorithms for quantum computation: discrete logarithms and factoring

Constructing public-key cryptographic schemes based on class group action on a set of isogenous elliptic curves

The Sage Developers: SageMath, the Sage Mathematics Software System

Acknowledgements. The authors want to thank María Naya-Plasencia for her helpful comments, Alain Couvreur and Jean-Pierre Tillich for helpful discussions on isogeny-based cryptography, Lorenz Panny and Joost Renes for their valuable comments on a draft of this paper. Thanks to Jean-François Biasse for pointing out the reference [6] , Luca De Feo, Ben Smith and Steven Galbraith for helpful comments on Kuperberg's algorithm and discussions on the NIST benchmark. Thanks to the anonymous Eurocrypt referees for helpful remarks.This project has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement n o 714294 -acronym QUASYModo).