key: cord-0046995-wqsg37yx
authors: De Micheli, Gabrielle; Piau, Rémi; Pierrot, Cécile
title: A Tale of Three Signatures: Practical Attack of ECDSA with wNAF
date: 2020-06-06
journal: Progress in Cryptology - AFRICACRYPT 2020
DOI: 10.1007/978-3-030-51938-4_18
sha: 3d50a43dda7e6a0611e97003f97405650955890d
doc_id: 46995
cord_uid: wqsg37yx

Attacking ECDSA with wNAF implementation for the scalar multiplication first requires some side channel analysis to collect information, then lattice based methods to recover the secret key. In this paper, we reinvestigate the construction of the lattice used in one of these methods, the Extended Hidden Number Problem (EHNP). We find the secret key with only 3 signatures, thus reaching a known theoretical bound, whereas best previous methods required at least 4 signatures in practice. Given a specific leakage model, our attack is more efficient than previous attacks, and for most cases, has better probability of success. To obtain such results, we perform a detailed analysis of the parameters used in the attack and introduce a preprocessing method which reduces by a factor up to 7 the total time to recover the secret key for some parameters. We perform an error resilience analysis which has never been done before in the setup of EHNP. Our construction find the secret key with a small amount of erroneous traces, up to [Formula: see text] of false digits, and [Formula: see text] with a specific type of error.

The Elliptic Curve Digital Signature Algorithm (ECDSA) [13] , first proposed in 1992 by Scott Vanstone [26] , is a standard public key signature protocol widely deployed. ECDSA is used in the latest library TLS 1.3, email standard OpenPGP and smart cards. It is also implemented in the library OpenSSL, and can be found in cryptocurrencies such as Bitcoin, Ethereum and Ripple. It benefits from a high security based on the hardness of the elliptic curve discrete logarithm problem and a fast signing algorithm due to its small key size. Hence, it is recognized as a standard signature algorithm by institutes such as ISO since 1998, ANSI since 1999, and IEEE and NIST since 2000.

The ECDSA signing algorithm requires scalar multiplications of a point P on an elliptic curve by an ephemeral key k. Since this operation is time-consuming and often the most time-consuming part of the protocol, it is necessary to use an efficient algorithm. The Non Adjacent Form (NAF) and its windowed variant (wNAF) were introduced as an alternative to the binary representation of the nonce k to reduce the execution time of the scalar multiplication. Indeed, the NAF representation does not allow two non-zero digits to be consecutive, thus reducing the Hamming weight of the representation of the scalar. This improves on the time of execution as the latter is dependent on the number of non-zero digits. The wNAF representation is present in implementations such as in Bitcoin, as well as in the libraries Cryptlib, BouncyCastle and Apple's Common-Crypto. Moreover, until very recently (May 2019), wNAF was present in all three branches of OpenSSL.

However, implementing the scalar multiplication using wNAF representation and no added layer of security makes the protocol vulnerable to side-channel attacks. Side-channel attacks were first introduced about two decades ago by Kocher et al. [14] , and have since been used to break many implementations, and in particular some cryptographic primitives such as AES, RSA, and ECDSA. They allow to recover secret information throughout observable leakage. In our case, this leakage corresponds to differences in the execution time of a part of the signing algorithm, observable by monitoring the cache.

For ECDSA, cache side-channel attacks such as Flush&Reload [28, 29] have been used to recover information about either the sequence of operations used to execute the scalar multiplication, or for example in [8] the modular inversion. For the scalar multiplication, these operations are either a multiplication or an addition depending on the bits of k. This information is usually referred to as a double-and-add chain or the trace of k. A trace is created when a signature is produced by ECDSA and thus we talk about signatures and traces in an equivalent sense. At this point, we ask how many traces need to be collected to successfully recover the secret key. Indeed, from an attacker's perspective, the least traces necessary, the more efficient the attack is. This quantity depends on how much information can be extracted from a single trace and how combining information of multiple traces is used to recover the key. We work on the latter to minimize the number of traces needed.

The nature of the information obtained from the side channel attack allows to determine what kind of method should be carried out to recover the secret key. Attacks on ECDSA are inspired by attacks on a similar cryptosystem, DSA. In 2001, Howgrave-Graham and Smart [12] showed how knowing partial information of the nonce k in DSA can lead to a full secret key recovery. Later, Nguyen and Shparlinski [19] gave a polynomial time algorithm that recovers the secret key in ECDSA as soon as some consecutive bits of the ephemeral key are known. They showed that using the information leaked by the side channel attack, one can recover the secret key by constructing an instance of the Hidden Number Problem (HNP) [4] . The basic structure of the attack algorithm is to construct a lattice which contains the knowledge of consecutive bits of the epheremal keys, and by solving CVP or SVP, to recover the secret key. This type of attack has been done in [3, 8, 26, 28] . However, these results considered perfect traces, but obtaining traces without any misreadings is very rare. In 2018, Dall et al. [6] included an error-resilience analysis to their attack: they showed that key recovery with HNP is still possible even in the presence of erroneous traces.

In 2016, Fan, Wang and Cheng [7] used another lattice-based method to attack ECDSA: the Extended Hidden Number Problem (EHNP) [11] . EHNP mostly differs from HNP by the nature of the information given as input. Indeed, the information required to construct an instance of EHNP is not sequences of consecutive bits, but the positions of the non-zero coefficients in any representation of some integers. This model, which we consider in this article as well, is relevant when describing information coming from Flush&Reload or Prime&Probe attacks for example, the latter giving a more generic scenario with no shared data between the attacker and the victim. In [7] , the authors are able to extract 105.8 bits of information per signature on average, and thus require in theory only 3 signatures to recover a 256-bit secret key. In practice, they were able to recover the secret key using 4 error-free traces.

In order to optimize an attack on ECDSA various aspects should be considered. By minimizing the number of signatures required in the lattice construction, one minimizes the number of traces needed to be collected during the side-channel attack. Moreover, reducing the time of the lattice part of the attack, and improving the probability of success of key recovery allows to reduce the overall time of the attack. In this paper, we improve on all three of these aspects. Furthermore, we propose the first error-resilience analysis for EHNP and show that key recovery is still possible with erroneous traces too.

In this work, we reinvestigate the attack against ECDSA with wNAF representation for the scalar multiplication using EHNP. We focus on the lattice part of the attack, i.e., the exploitation of the information gathered by the side-channel attack. We first assume we obtain a set of error-free traces from a side-channel analysis. We preselect some of these traces to optimize the attack. The main idea of the lattice part is then to use the ECDSA equation and the knowledge gained from the selected traces to construct a set of modular equations which include the secret key as an unknown. These modular equations are then incorporated into a lattice basis similar to the one given in [7] , and a short vector in it will contain the necessary information to reconstruct the secret key. We call "experiment" one run of this algorithm. An experiment succeeds if the algorithm recovers the secret key.

A New Preprocessing Method. The idea of selecting good traces beforehand has already been explored in [27] . The authors suggest three rules to select traces that improve the attack on the lattice part. Given a certain (large) amount of traces available, the lattice is usually built with a much smaller subset of these traces. Trying to identify beforehand the traces that would result in a better attack is a clever option. The aim of our new preprocessing-that completely differs from [27] -is to regulate the size of the coefficients in the lattice, and this results in a better lattice reduction time. For instance, with 3 signatures, we were able to reduce the total time of the attack by a factor of 7.

Analyzing the Attack. Several parameters intervene while building and reducing the lattice. We analyze the performance of the attack with respect to these parameters and present the best parameters that optimize either the total time or the probability of success.

First, we focus on the attack time. Note that when talking about the overall time of the attack, we consider the average time of a single experiment multiplied by the number of trials necessary to recover the secret key. We compare 1 our times with the numbers reported in [7, Table 3 ] with method C. Indeed, methods A and B in [7] use extra information that comes from choices in the implementation which we choose to ignore as we want our analysis to remain as general as possible. The comparison is justified as we consider the same leakage model, and compare timings when running experiments on similar machines. For 4 signatures, our attack is slightly slower 2 than timings in [7] . However, when considering more than 4 signatures, our attack is faster. We experiment up to 8 signatures to further improve our overall time. In this case, our attack runs at best in 2 min and 25 s. Timings for 8 signatures are not reported in [7] , and the case of 3 signatures was never reached before our work. In Table 1 , we compare our times with the fastest times reported by [7] . We choose their fastest times but concerning our results we choose to report experiments which are faster (not the fastest) with, if possible, better probability than theirs. The overall time of the attack is also dependent on the success probability of key recovery. From Table 2 , one can see that our success probability is higher than [7] , except for 7 signatures. They have 68% of success with their best parameters whereas we only reach 45% in this case. 1 In order to have a fair comparison with our methodology, the times reported in [7] with which we compare ourselves have to be multiplied by the number of trials necessary for their attack succeed, thus increasing their total time by a lot. For the sake of completeness, we mention that in [21] , the authors use HNP to recover the secret key using 13 signatures. Their success probability in this case is around 54% and their overall time is close to 20 s, hence much faster. However, as their leakage model is different, we do not further mention their work.

Finding the Key with Only Three Signatures. Overall, combining a new preprocessing method, a modified lattice construction and a careful choice of parameters allows us to mount an attack which works in practice with only 3 signatures. However, the probability of success in this case is very low. We were able to recover the secret key only once with BKZ-35 over 5000 experiments. This result is difficult to quantify as a probability but we note that finding the key a single time over 5000 experiments is still much better than randomly finding a 256-bit integer. If we assume the probability is around 0.02%, as each trial costs 200 s in average, we can expect to find the secret key after 12 days using a single core. Note that this time can be greatly reduced when parallelizing the process, i.e., each trial can be run on a separate core. On the other hand, if we use our preprocessing method, with 3 signatures we obtain a probability of success of 0.2% and a total time of key recovery of 39 h, thus the factor 7 of improvement mentioned above. Despite the low probability of success, this result remains interesting nonetheless. Indeed, the authors in [7] reported that in practice, the key couldn't be recovered using less than 4 signatures and we improve on their result. 

We also investigate the resilience to errors of our attack. Such an analysis has not yet been done in the setup of EHNP. It is important to underline that collecting traces without any errors using any side-channel attack is very hard. Previous works used perfect traces to mount the lattice attack. Thus, it required collecting more traces. As pointed out in [7] , more or less twice as many signatures are needed if errors are considered. In practice, this led [7] to gather in average 8 signatures to be able to find the key with 4 perfect traces.

We experimentally show that we are still able to recover the secret key even in the presence of faulty traces. In particular, we find the key using only 4 faulty traces, but with a very low probability of success. As the percentage of incorrect digits in the trace grows, the probability of success decreases and thus more signatures are required to successfully recover the secret key. For instance, if 2% of the digits are wrong among all the digits of a given set of traces, it is still possible to recover the key with 6 signatures. This result is valid if errors are uniformly distributed over the digits. However, we have a better probability to recover the key if errors consist in 0-digit faulty readings, i.e., 0 digits read as non-zero. In other words, the attack could work with a higher percentage of errors, around 4%, if we could ensure from the side channel attack and some preprocessing methods that none of the non-zero digits have been flipped to 0.

Organization: Sect. 2 gives background on ECDSA and the wNAF representation. In Sect. 3, we explain how to transform EHNP into a lattice problem. We explicit the lattice basis and analyze the length of the short vectors found in the reduced basis. In Sect. 4, we introduce our preprocessing method which allows us to reduce the overall time of our attack. In Sect. 5, we give experimental results. Finally, in Sect. 6, we give an error resilience analysis.

The ECDSA algorithm is a variant of the Digital Signature Algorithm, DSA, [17] which uses elliptic curves instead of finite fields. The parameters used in ECDSA are an elliptic curve E over a finite field, a generator G of prime order q and a hash function H. The private key is an integer α such that 1 < α < q − 1 and the public key is p k = [α]G, the scalar multiplication of G by α.

To sign a message m using the private key α, randomly select an ephemeral key k ← R Z q and compute [k]G. Let r be the x-coordinate of [k]G. If r = 0, select a new nonce k. Then, compute s = k −1 (H(m) + αr) mod q and again if s = 0, select a new nonce k. Finally, the signature is given by the pair (r, s).

In order to verify a signature, first check if r, s ∈ Z q , otherwise the signature is not valid. Then,

We consider a 128-bit level of security. Hence α, q and k are 256-bit integers.

The ECDSA algorithm requires the computation of [k]G, a scalar multiplication. In [10] , various methods to compute fast exponentiation are presented. One family of such methods is called window methods and comes from NAF representation. Indeed, the NAF representation does not allow two non-zero digits to be consecutive, thus reducing the Hamming weight of the representation of the scalar. The basic idea of a window method is to consider chunks of w bits in the representation of the scalar k, compute powers in the window bit by bit, square w times and then multiply by the power in the next window. The window methods can be combined with the NAF representation of k. For any k ∈ Z, a representation k = ∞ j=0 k j 2 j is called a NAF if k j ∈ {0, ±1} and k j k j+1 = 0 for all j ≥ 0. Moreover, every k has a unique NAF representation. The NAF representation minimizes the number of non-zero digits k j . It is presented in Algorithm 1.

The NAF representation can be combined with a sliding window method to further improve the execution time. For instance, in OpenSSL (up to the latest versions using wNAF 1.1.1b), the window size usually chosen was w = 3, which is the value we set for all our experiments. The scalar k is converted into wNAF form using Algorithm 2. The sequence of digits m i belongs to the set {0, ±1, ±3, . . . , ±(2 w − 1)}. Let k be the sum of its non-zero digits, renamed k i . More precisely, we get k = j=1 k j 2 λj , where is the number of non-zero digits, and λ j represents the position of the digit k j in the wNAF representation.

.

A Z-lattice is a discrete additive subgroup of Z n . It is usually specified by a basis matrix B ∈ Z n×n . The lattice L(B) generated by B consists of all integer combinations of the row vectors in B. The determinant of a lattice is the absolute value of the determinant of a basis matrix: det L(B) = | det B|. The discreteness property ensures that there is a vector v 1 reaching the minimum non-zero value for the euclidean norm. Let us write ||v 1 || 2 = λ 1 . Let λ i be the i th successive minimum of the lattice. The LLL algorithm [15] takes as an input a lattice basis, and returns in polynomial time in the lattice dimension n a reduced lattice basis whose vectors b i satisfy the worst-case approximation bound ||b i || 2 ≤ 2 (n−1)/2 λ i . In practice, for random lattices, LLL obtains approximation factors such that ||b 1 || 2 ≤ 1.02 n λ 1 as noted by Nguyen and Stehlé [18] . Moreover, for random lattices, the Gaussian heuristic implies that λ 1 ≈ n/(2πe) det(L) 1/n .

The BKZ algorithm [22, 24] is exponential in some given block-size β and polynomial in the lattice dimension n. It outputs a reduced lattice basis whose vectors b i satisfy the approximation ||b i || 2 ≤ iγ [23] , where γ β is the Hermite constant. In practice, Chen and Nguyen [5] observed that BKZ returns vectors such that b 1 ≤ (1 + β ) n λ 1 where β depends on the block-size β. For random lattices, they get 1 + β = 1.01 for a block-size β = 85.

Using some side-channel attack, one can recover information about the wNAF representation of the nonce k. In particular, it allows us to know the positions of the non-zero coefficients in the representation of k. However, the value of these coefficients are unknown. This information can be used in the setup of the Extended Hidden Number Problem (EHNP) to recover the secret key. For many messages m, we use ECDSA to produce signatures (r, s) and each run of the signing algorithm produces a nonce k. We assume we have the corresponding trace of the nonce, that is, the equivalent of the double-and-add chain of kG using wNAF. The goal of the attack is to recover the secret α while optimizing either the number of signatures required or the total time of the attack.

The Hidden Number Problem (HNP) allows to recover a secret element α ∈ Z q if some information about the most significant bits of random multiples of α (mod q) are known for some prime q. Boneh and Venkatesan show how to recover α in polynomial time with probability greater than 1/2. In [11] , the authors extend the HNP and present a polynomial time algorithm for solving the instances of this extended problem. The Extended Hidden Number Problem is defined as follows. Given u congruences of the form

where the secret α and 0 k i,j 2 ηij are unknown, and the values η ij , a i , b i,j , c i , i are known for 1 i u (see [11] , Definition 3), one has to recover α in polynomial time. The EHNP can then be transformed into a lattice problem and one recovers the secret α by solving a short vector problem in a given lattice.

From the ECDSA algorithm, we know that given a message m, the algorithm outputs a signature (r, s) such that

The value H(m) is just some hash of the message m. We consider a set of u signature pairs (r i , s i ) with corresponding message m i that satisfy Eq. (2). For each signature pair, we have a nonce k. Using the wNAF representation of k, we write k = j=1 k j 2 λj , with k j ∈ {±1, ±3, . . . , ±(2 w − 1)} and the choice of w depends on the implementation. Note that the coefficients k j are unknown, however, the positions λ j are supposed to be known via some sidechannel leakage. It is then possible to represent the ephemeral key k as the sum of a known part, and an unknown part. As the value of k j is odd, one can write

Using the same notations as in [7] ,

In the rest of the paper, we will denote by μ j the window-size of d j . Note that here, μ j = w but this window-size will be modified later. This allows to rewrite the value of k as

withk = j=1 2 λj − j=1 2 λj +w . The expression ofk represents the known part of k. By substituting k in Eq. (3), we get a system of modular equations:

where the unknowns are α and the d i,j . The known values are i , which is the number of non-zero digits in k for the i th sample, λ i,j , which is the position of the j th non-zero digit in k for the i th sample andk defined above. Equation (4) is then used as input to EHNP, following the method explained in [11] . The problem of finding the secret key is then reduced to solving the short vector problem in a given lattice presented in the following section.

Before giving the lattice basis construction, we redefine Eq. (4) to reduce the number of unknown variables in the system. This will allow us to construct a lattice of smaller dimension. Again, we use the same notations as in [7] .

Eliminating One Variable. One straightforward way to reduce the lattice dimension is to eliminate a variable from the system. In this case, one can eliminate α from Eq. (4). Let E i denote the i th equation of the system. Then by computing r 1 E i − r i E 1 , we get the following new modular equations

:=γi ≡ 0 (mod q).

(5) Using the same notations as in [7] , we define τ j,i = 2 λ1,j +1 s 1 

Even if α is eliminated from the equations, if we recover some d i,j values from a short vector in the lattice, we can recover α using any equation in the modular system (4). We now use Eq. (5) to construct the lattice basis.

Let L be the lattice constructed for the attack, and we have L = L(B) where the lattice basis B is given below. Let m = max i,j μ ij for 1 j i and 2 i u. We set a scaling factor Δ ∈ N to be defined later. The lattice basis is given by If we are able to find w in the lattice, then we can reconstruct the secret key α. In order to find w, we estimate its norm and make sure w appears in the reduced basis. After reducing the basis, we look for vectors of the correct shape, i.e., with sufficiently many zeros at the beginning and the correct last coefficient, and attempt to recover α for each of these.

How the Size of Δ Affects the Norms of the Short Vectors. In order to find the vector w in the lattice, we reduce B using LLL or BKZ. For w to appear in the reduced basis, one should at least set Δ such that

The vector w we expect to find has norm ||w|| 2 2 m−1 √ T + 1. From Eq. (6), one can deduce the value of Δ needed to find w in the reduced lattice:

In our experiments, the average value of i for 1 i u is˜ = 26, and thus T = u ×˜ on average. Moreover, the average value of μ ij is 7 and so on average μ ij = 7 × u ×˜ . Hence, if we compute Δ th for u = 3, . . . , 8, with these values, we obtain Δ th 1, which does not help us to set this parameter. In practice, we verify that Δ = 1 allows us to recover the secret key. In Sect. 5, we vary the size of Δ to see whether a slightly larger value affects the probability of success.

Too Many Small Vectors. While running BKZ on B, we note that for some specific sets of parameters the reduced basis contains some undesired short vectors, i.e., vectors that are shorter than w. This can be explained by looking at two consecutive rows in the lattice basis given above, say the j th row and the (j +1) th row. For example, one can look at rows which correspond to the σ i,j values but the same argument is valid for the rows concerning the τ j,i . From the definitions of the σ values we have σ i,j+1 = −2 λi,j+1+1 · s i r 1 = −2 λi,j+1+1 · ( σi,j −2 λ i,j +1 ). So σ i,j+1 = 2 λi,j+1−λi,j · σ i,j . Thus the linear combination given by the (j + 1) th row minus 2 λi,j+1−λi,j times the j th row gives a vector (0 , · · · , 0 , −2 λi,j+1−λi,j +m−μi,j , 2 m−μi,j+1 , 0 , · · · , 0).

Yet, this vector is expected to have smaller norm than w. Some experimental observations are detailed in Sect. 5. [7] . Let B be the lattice basis constructed in [7] . Our basis B is a rescaled version of B such that B = 2 m ΔB . This rescaling allows us to ensure that all the coefficients in our lattice basis are integer values. Note that [7] have a value δ in their construction which corresponds to 1/Δ. In this work, we give a precise analysis of the value of Δ, both theoretically and experimentally in Sect. 5, which is missing in [7] .

In [7] , the authors present another way to further reduce the lattice dimension, which they call the merging technique. It aims at reducing the lattice dimension by reducing the number of non-zero digits of k. The lattice dimension depends on the value T = u i=1 i , and thus reducing T reduces the dimension. To understand the attack, it suffices to know that after merging, we obtain some new values corresponding to the new number of non-zero digits and λ j the position of these digits for 1 j . After merging, one can rewrite k = k + j=1 d j 2 λ j +1 , where the new d j have a new window size which we denote μ j , i.e., 0 d j 2 μj − 1.

We present our merging algorithm based on Algorithm 3 given in [7] . Our algorithm modifies directly the sequence {λ j } j=1 , whereas [7] work on the double-and-add chains. This helped us avoid some implementation issues such as an index outrun present in Algorithm 3 [7] , line 7. To facilitate the ease of reading of (our) Algorithm 3, we work with dynamic tables. Let us first recall various known methods we use in the algorithm: push back(e) inserts an element e at the end of the table, at(i) outputs the element at index i, and last() returns the last element of the 

A useful example of the merging technique is given in [7] . From 3 to 8 signatures the approximate dimension of the lattices using the elimination and merging techniques are the following: 80, 110, 135, 160, 190 and 215. Each new lattice dimension is roughly 54% of the dimension of the lattice before applying these techniques, for the same number of signatures. For instance, with 8 signatures we would have a lattice of dimension 400 on average, far too large to be easily reduced. For the traces we consider, after merging the mean of the i is 26, the minimum being 17 and the maximum 37 with a standard deviation of 3. One could further reduce the lattice dimension by preprocessing traces with small i . However, the standard deviation being small, the difference in the reduction times should not be affected too much.

The two main pieces of information we can extract and use in our attack are first the number of non-zero digits in the wNAF representation of the nonce k, denoted and the weight of each non-zero digit, denoted μ j for 1 j . Let T be the set of traces we obtained from the side-channel leakage representing the wNAF representation of the nonce k used while producing an ECDSA signature. We consider the subset S a = {t ∈ T | max j μ j a, 1 j }. We choose to preselect traces in a subset S a for small values of a. The idea behind this preprocessing is to regulate the size of the coefficients in the lattice. Indeed, when selecting u traces for the attack, by upper-bounding m = max i,j μ i,j for 2 i u, we force the coefficients to remain smaller than when taking traces at random.

We work with a set T of 2000 traces such that for all traces 11 ≤ max j μ j ≤ 67. The proportion of signatures corresponding to the different preprocessing subsets we consider in our experiments are: 2% for S 11 , 18% for S 15 and 44% for S 19 . The effect of preprocessing on the total time is explained in Sect. 5.

Traces from the Real World. We work with the elliptic curve secp256k1 but none of the techniques introduced here are limited to this specific elliptic curve. We consider traces from a Flush&Reload attack, executed through hyperthreading, as it can virtually recover the most amount of information. 3 To the best of our knowledge, the only information we can recover are the positions of the non-zero digits. We are not able to determine the sign or the value of the digits in the wNAF representation. In [7] , the authors exploit the fact that the length of the binary string of k is fixed in implementations such as OpenSSL, and thus more information can be recovered by comparing this length to the length of the double-and-add chain. In particular, they were able to recover the MSB of k, and in some cases the sign of the second MSB. We do not consider this extra information as we want our analysis to remain general.

We report calculations ran on error-free traces where we evaluate the total time necessary to recover the secret key and the probability of success of the attack. Our experiments have two possible outputs: either we reconstruct the secret key α and thus consider the experiment a success, or we do not recover the secret key, and the experiment fails. In order to compute the success probability and the average time of one reduction, we run 5000 experiments for some specific sets of parameters using either Sage's default BKZ implementation [25] or a more recent implementation of the latest sieving strategies, the General Sieve Kernel (G6K) [1] . The experiments were ran using the cluster Grid'5000 on a single core of an Intel Xeon Gold 6130. The total time is the average time of a single reduction multiplied by the number of trials necessary to recover the key. The number of trials necessary to recover the secret key corresponds the number of experiments ran until we have a success for a given set of parameters. For a fixed number of signatures, we either optimize the total time or the success probability. We report numbers in Tables 3, 4 when using BKZ. 4 Comments on G6K: We do not report the full experiments ran with G6K since using this implementation does not lead to the fastest total time of our attack: around 2 min using 8 signatures for BKZ and at best 5 min for G6K. However, G6K allows to reduce lattices with much higher block-sizes than BKZ. For comparable probabilities of success, G6K is faster. Considering the highest probability achieved, on one hand, BKZ-35 leads to a probability of success of 45%, and a single reduction takes 133 min. On the other hand, to reach around the same probability of success with G6K, we increase the block-size to 80, and a single reduction is only around 45 min on average. This is an improvement by a factor of 3 in the reduction time.

Only 3 Signatures. Using Δ ≈ 2 3 and no preprocessing, we recovered the secret key using 3 signatures with BKZ-35 only once and three times with BKZ-40. When using pre-processing S 11 , BKZ-35 and Δ ≈ 2 3 , the probability of success went up to 0.2%. Since all the probabilities remain much less than 1% an extensive analysis would have taken too long. Thus, in the rest of the section, the number of signatures only varies between 4 and 8. However, we want to emphasize that it is precisely this detailed analysis on a slightly higher number of signatures that allowed us to understand the impact of the parameters on the performance of the attack and resulted in finding the right ones allowing to mount the attack with 3 signatures.

Varying the Bitsize of Δ. In Fig. 1 , we analyze the total time of key recovery as a function of the bitsize of Δ. We fix the block-size of BKZ to 25 and take traces without any preprocessing. We are able to recover the secret key by setting Δ = 1, which is the lowest theoretical value one can choose. However, we observed a slight increase in the probability of success by taking a larger Δ. Without any surprise, we note that the total time to recover the secret key increases with the bitsize of Δ as the coefficients in the lattice basis become larger.

Analyzing the Effect of Preprocessing. We analyze the influence of our preprocessing method on the attack time. We fix BKZ block-size to 25. The effect of preprocessing is influenced by the bitsize of Δ and we give here an analyze for Δ ≈ 2 25 since the effect is more noticeable. The effect of preprocessing is difficult to predict since its behavior varies depending on the parameters, having both positive and negative effects. On the one hand, we reduce the size of all the coefficients in the lattice, thus reducing the reduction time. On the other hand, we generate more potential small vectors 5 with norms smaller than the norm of w. For this reason, the probability of success of the attack decreases since the vector w is more likely to be a linear combination of vectors already in the reduced basis. For example, with 7 signatures we find in average w to be the third or fourth vector in the reduced basis without preprocessing, whereas with S 11 it is more likely to appear in position 40.

The positive effect of preprocessing is most noticeable for u = 4 and u = 5, as shown in Fig. 2 . For instance, using S 15 and u = 4 lowers the overall time by a factor up to 5.7. For u = 5, we gain a factor close to 3 by using either S 15 or S 19 . For u > 5, using preprocessed traces is less impactful. For large Δ such as Δ ≈ 2 25 , we still note some lower overall times when using S 15 and S 19 , up to a factor 2. When the bitsize gets smaller, reducing the size of the coefficients in the lattice is less impactful.

Balancing the Block-size of BKZ. Finally, we vary the block-size in the BKZ algorithm. We fix Δ ≈ 2 3 and use no preprocessing. We plot the results in Fig. 3 for 6 and 7 signatures. For other values of u, the plot is very similar and we omit them in Fig. 3 . Without any surprise, we see that as we increase the blocksize, the probability of success increases, however the reduction time increases significantly as well. This explains the results shown in Table 3 and Table 4 : to reach the best probability of success one needs to increase the block-size in BKZ (we did not try any block-size greater than 40), but to get the fastest key recovery attack, the block-size is chosen between 20 and 25, except for 3 signatures where the probability of success is too low with these parameters. 

It is not unexpected to have errors in the traces collected during side-channel attacks. Obtaining error-free traces requires some amount of work on the signal processing side. Prior to [6] , the presence of errors in traces was either ignored or preprocessing was done on the traces until an error-free sample was found, see [2, 9] . In [6] , it is shown the lattice attack still successfully recovers the secret key even when traces contain errors. An error in the setup given in [6] corresponds to an incorrect bound on the size of the values being collected. In our setup, a trace without errors corresponds to a trace where every single coefficient in the wNAF representation of k has been identified correctly as either non-zero or not. The probability of having an error in our setup is thus much higher. Side-channel attacks without any errors are very rare. Both [21] and [6] give some analysis of the attacks Flush&Reload and Prime&Probe in real life scenarios.

In [7] , the results presented in the paper assume the Flush&Reload is implemented perfectly, without any error. In particular, to obtain 4 perfect traces and be able to run their experiment and find the key, one would need to have in average 8 traces from Flush&Reload -the probability to conduct to a perfect reading of the traces being 56% as pointed out in [21] . In our work, we show that it is possible to recover the secret key using only 4, even erroneous, traces. However, the probability of success is very low.

Recall that an error in our case corresponds to a flipped digit in the trace of k. Table 5 shows the attack success probability in the presence of errors. We ran used BKZ-25 and Δ ≈ 2 3 with traces taken from S all . We average over 5000 experiments. We write 1 when the attack succeeded less than five times over 5000 experiments, thus making it difficult to evaluate the probability of success. The attack works up to a resilience to 2% of errors. Indeed, for u = 6, we recovered the secret key with 30 errors, i.e., 30 flipped digits over 6 × 257 digits.

Different Types of Errors. There exists two possible types of errors. In the first case, a coefficient which is zero is evaluated as a non-zero coefficient. In theory, this only adds a new variable to the system, i.e., the number of non-zero coefficients is overestimated. This does not affect the probability of success much. Indeed, we just have an overly-constrained system. We can see in Fig. 4 that the probability of success of the attack indeed decreases slowly as we add errors of this form. With errors only of this form, we were able to recover the secret key up to nearly 4% of errors, (for instance with u = 6, using BKZ-35). The other type of errors consists of a non-zero coefficients which is misread as a zero coefficient. In this case, we lose information necessary for the key recovery and thus this type of error affects the probability of success far more importantly as can also be seen in Fig. 4 . In this setup, we were not able to recover the secret key when more than 3 errors of this type appear in the set of traces considered.

Strategy. If the signal processing method is hesitant between a non-zero digit or 0, we would recommend to favor putting a non-zero instead of 0 to increase the chance of having an error of type 0 → non-zero, for which the attack is a lot more tolerant. 

In the last decades, most implementations of ECDSA have been the target of microarchitectural attacks, and thus existing implementations have either been replaced by more robust algorithms, or layers of security have been added.

For example, one way of minimizing leakage from the scalar multiplication is to use the Montgomery ladder scalar-by-point multiplication [16] , much more resilient to side-channel attacks due to the regularity of the operations. However, this does not entirely remove the risk of leakage [28] . Additional countermeasures are necessary.

When looking at common countermeasures, many implementations use blinding or masking techniques [20], for example in BouncyCastle implementation of ECDSA. The former consists in blinding the data before doing any operations, and masking techniques randomize all the data-dependent operations by applying random transformations, thus making any leakage inexploitable.

However, it is important to keep in mind these lattices attacks as they can be applied at any level of an implementation that leaks the correct information.

The general sieve kernel and new records in lattice reduction. Cryptology ePrint Archive

DSA signing key recovery with noisy side channels and variable error rates

Just a Little Bit" : a small amount of side channel can go a long way

Hardness of computing the most significant bits of secret keys in diffie-hellman and related schemes

BKZ 2.0: better lattice security estimates

Cachequote: Efficiently recovering long-term secrets of SGX EPID via cache attacks

Attacking OpenSSL implementation of ECDSA with a few signatures

Constant-time callees with variable-time callers

ECDSA key extraction from mobile devices via nonintrusive physical side channels

A survey of fast exponentiation methods

Extended hidden number problem and its cryptanalytic applications

Lattice attacks on digital signature schemes

The elliptic curve digital signature algorithm (ECDSA)

Differential power analysis

Factoring polynomials with rational coefficients

Speeding the pollard and elliptic curve methods of factorization

LLL on the average

The insecurity of the elliptic curve digital signature algorithm with partially known nonces

Just a little bit more

A hierarchy of polynomial time lattice basis reduction algorithms

Block reduced lattice bases and successive minima

Lattice basis reduction: Improved practical algorithms and solving subset sum problems

The FPLLL development team: FPLLL, a lattice reduction library

Responses to NIST's proposals

Attacking OpenSSL ECDSA with a small amount of sidechannel information

Recovering OpenSSL ECDSA nonces using the FLUSH+RELOAD cache side-channel attack

FLUSH+RELOAD: A high resolution, low noise, L3 cache side-channel attack

We would like to thank Nadia Heninger for discussions about possible lattice constructions, Medhi Tibouchi for answering our side-channel questions, Alenka Zajic and Milos Prvulovic for providing us with traces from OpenSSL that allowed us to confirm our results on a deployed implementation, Daniel Genkin for pointing us towards the Extended Hidden Number Problem, and Pierrick Gaudry for his precious support and reading. Experiments presented in this paper were carried out using the Grid'5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several universities as well as other organizations.