key: cord-0047218-tpbfyo7a authors: van der Hoeven, Joris; Monagan, Michael title: Implementing the Tangent Graeffe Root Finding Method date: 2020-06-06 journal: Mathematical Software - ICMS 2020 DOI: 10.1007/978-3-030-52200-1_48 sha: 635d0c304669f60f992d23a1037c41e903790562 doc_id: 47218 cord_uid: tpbfyo7a The tangent Graeffe method has been developed for the efficient computation of single roots of polynomials over finite fields with multiplicative groups of smooth order. It is a key ingredient of sparse interpolation using geometric progressions, in the case when blackbox evaluations are comparatively cheap. In this paper, we improve the complexity of the method by a constant factor and we report on a new implementation of the method and a first parallel implementation. Consider a polynomial function f : K n → K over a field K given through a black box capable of evaluating f at points in K n . The problem of sparse interpolation is to recover the representation of f ∈ K[x 1 , . . . , x n ] in its usual form, as a linear combination f = 1 i t c i x ei (1) of monomials x ei = x e1, 1 1 · · · x e1,n n . One popular approach to sparse interpolation is to evaluate f at points in a geometric progression. This approach goes back to work of Prony in the eighteen's century [15] and became well known after Ben-Or and Tiwari's seminal paper [2] . It has widely been used in computer algebra, both in theory and in practice; see [16] for a nice survey. More precisely, if a bound T for the number of terms t is known, then we first evaluate f at 2T − 1 pairwise distinct points α 0 , α 1 , . . . , α 2T −2 , where α = (α 1 , . . . , α n ) ∈ K n and α k := (α k 1 , . . . , α k n ) for all k ∈ N. The generating function of the evaluations at α k satisfies the identity where Λ = (1 − α e1 z) · · · (1 − α et z) and N ∈ K[z] is of degree < t. The rational function N/Λ can be recovered from f (α 0 ), f(α 1 ), . . . , f(α 2T −2 ) using fast Padé Note: This paper received funding from NSERC (Canada) and "Agence de l'innovation de défense" (France). Note: This document has been written using GNU T E X macs [13] . approximation [4] . For well chosen points α, it is often possible to recover the exponents e i from the values α ei ∈ K. If the exponents e i are known, then the coefficients c i can also be recovered using fast structured linear algebra [5] . This leaves us with the question how to compute the roots α −ei of Λ in an efficient way. For practical applications in computer algebra, we usually have K = Q, in which case it is most efficient to use a multi-modular strategy, and reduce to coefficients in a finite field K = F p , where p is a prime number that we are free to choose. It is well known that polynomial arithmetic over F p can be implemented most efficiently using FFTs when the order p − 1 of the multiplicative group is smooth. In practice, this prompts us to choose p of the form s2 l + 1 for some small s and such that p fits into a machine word. The traditional way to compute roots of polynomials over finite fields is using Cantor and Zassenhaus' method [6] . In [10, 11] , alternative algorithms were proposed for our case of interest when p−1 is smooth. The fastest algorithm was based on the tangent Graeffe transform and it gains a factor log t with respect to Cantor-Zassenhaus' method. The aim of the present paper is to report on a parallel implementation of this new algorithm and on a few improvements that allow for a further constant speed-up. In Sect. 2, we recall the Graeffe transform and the heuristic root finding method based on the tangent Graeffe transform from [10] . In Sect. 3, we present the main new theoretical improvements, which all rely on optimizations in the FFT-model for fast polynomial arithmetic. Our contributions are twofold. In the FFT-model, one backward transform out of four can be saved for Graeffe transforms of order two (see Sect. 3.2). When composing a large number of Graeffe transforms of order two, FFT caching can be used to gain another factor of 3/2 (see Sect. 3.3). In the longer preprint version of the paper [12] , we also show how to generalize our methods to Graeffe transforms of general orders and how to use it in combination with the truncated Fourier transform. Section 4 is devoted to our new sequential and parallel implementations of the algorithm in C and Cilk C. Our sequential implementation confirms the gain of a new factor of two when using the new optimizations. So far, we have achieved a parallel speed-up by a factor of 4.6 on an 8-core machine. Our implementation is freely available at http://www.cecm.sfu.ca/CAG/code/TangentGraeffe. The traditional Graeffe transform of a monic polynomial P ∈ K[z] of degree d is the unique monic polynomial G(P ) ∈ K[z] of degree d such that G(P )(z 2 ) = P (z)P (−z). ( If P splits over K into linear factors P = (z − β 1 ) · · · (z − β d ), then one has More generally, given r 2, we define the Graeffe transform of order r to be the unique monic polynomial G r (P ) ∈ K[z] of degree d such that G r (P )(z) = (−1) rd Res u (P (u), u r − z). If r, s 2, then we have Let be a formal indeterminate with 2 = 0. The definitions from the previous subsection readily extend to coefficients in K[ ] instead of K. Given r 2, we call G r (P ) the tangent Graeffe transform of P of order r. We have Now assume that we have an efficient way to determine the roots α r 1 , . . . , α r d of G r (P ). For some polynomial T ∈ K[z], we may decompose G r (P ) = G r (P ) + T For any root α r k of G r (P ), we then have Whenever α r k happens to be a single root of G r (P ), it follows that . If α r k = 0, this finally allows us to recover α k as α k = r α r k rα r−1 k . Assume now that K = F p is a finite field, where p is a prime number of the form p = σ2 m + 1 for some small σ. Assume also that ω ∈ F p be a primitive element of order p − 1 for the multiplicative group of F p . be as in the previous subsection. The tangent Graeffe method can be used to efficiently compute those α k of P for which α r k is a single root of G r (P ). In order to guarantee that there are a sufficient number of such roots, we first replace P (z) by P (z + τ ) for a random shift τ ∈ F p , and use the following heuristic: For a random shift τ ∈ F p and any r (p − 1)/(4d), the assumption ensures with probability at least 1/2 that G r (P (z + τ )) has at least d/3 single roots. Now take r to be the largest power of two such that r (p − 1)/(4d) and let s = (p − 1)/r. By construction, note that s = O(d). The roots α r 1 , . . . , α r d of G r (P ) are all s-th roots of unity in the set {1, ω r , . . . , ω (s−1)r }. We may thus determine them by evaluating G r (P ) at ω i for i = 0, . . . , s − 1. Since s = O(d), this can be done efficiently using a discrete Fourier transform. Combined with the tangent Graeffe method from the previous subsection, this leads to the following probabilistic algorithm for root finding: of degree d and only order one factors, p = σ2 m + 1 Output: the set {α 1 , . . . , α d } of roots of P 1. If d = 0 then return ∅ 2. Let r = 2 N ∈ 2 N be largest such that r (p − 1)/(4d) and let s := (p − 1)/r 3. Pick τ ∈ F p at random and compute P * : , which requires three polynomial multiplications in F p [z] of degree d. In total, step 5 thus performs O(log(p/s)) such multiplications. We discuss how to perform step 5 efficiently in the FFT model in Sect. 3. For practical implementations, one may vary the threshold r (p − 1)/(4d) for r and the resulting threshold s 4d for s. For larger values of s, the computations of the DFTs in step 6 get more expensive, but the proportion of single roots goes up, so more roots are determined at each iteration. From an asymptotic complexity perspective, it would be best to take s d √ log p. In practice, we actually preferred to take the lower threshold s 2d, because the constant factor of our implementation of step 6 (based on Bluestein's algorithm [3] ) is significant with respect to our highly optimized implementation of the tangent Graeffe method. A second reason we prefer is that the total space used by the algorithm is linear in s. In the future, it would be interesting to further speed up step 6 by investing more time in the implementation of high performance DFTs of general orders s. Assume n ∈ N is invertible in K and let ω ∈ K be a primitive n-th root of unity. Consider a polynomial A = a 0 + a 1 z + · · · + a n−1 z n−1 ∈ K[z]. Then the discrete Fourier transform (DFT) of order n of the sequence (a i ) 0 i