key: cord-0582316-re2ynd6n authors: Shen, Xuefeng; Tran, Khoa; Leok, Melvin title: High-order symplectic Lie group methods on $SO(n)$ using the polar decomposition date: 2022-01-26 journal: nan DOI: nan sha: a6272b81d9ce6cca9d36156b2b959c2a34b4afa6 doc_id: 582316 cord_uid: re2ynd6n A variational integrator of arbitrarily high-order on the special orthogonal group $SO(n)$ is constructed using the polar decomposition and the constrained Galerkin method. It has the advantage of avoiding the second-order derivative of the exponential map that arises in traditional Lie group variational methods. In addition, a reduced Lie--Poisson integrator is constructed and the resulting algorithms can naturally be implemented by fixed-point iteration. The proposed methods are validated by numerical simulations on $SO(3)$ which demonstrate that they are comparable to variational Runge--Kutta--Munthe-Kaas methods in terms of computational efficiency. However, the methods we have proposed preserve the Lie group structure much more accurately and and exhibit better near energy preservation. 1. Introduction 1.1. Overview. Given a configuration manifold Q, variational integrators provide a useful method of deriving symplectic integrators for Lagrangian mechanics on the tangent bundle T Q in terms of the Lagrangian L, or for Hamiltonian mechanics on the cotangent bundle T * Q in terms of the Hamiltonian H. It involves discretizing Hamilton's principle or Hamilton's phase space principle rather than the Euler-Lagrange or Hamilton's equations. Discrete Lagrangian variational mechanics is described in terms of a discrete Lagrangian L d (q 0 , q 1 ), which is an approximation of the exact discrete Lagrangian, The discrete Hamilton's principle states that the discrete action sum is stationary for variations that vanish at the endpoints. This yields the discrete Euler-Lagrange equations, where D i denotes the partial derivative with respect to the i-th argument. This defines an update map on Q × Q, where (q k−1 , q k ) → (q k , q k+1 ). The update map can equivalently be described in terms of the discrete Legendre transforms, which defines an update map (q k , p k ) → (q k+1 , p k+1 ) on T * Q that automatically preserves the canonical symplectic structure on T * Q. The order of the variational integrator depends on how accurately L d (q 0 , q 1 ) approximates L exact d (q 0 , q 1 ). To derive a high-order discrete Lagrangian, a typical approach is the Galerkin method [14] . This involves considering the definition of the exact discrete Lagrangian, replacing C 2 ([0, h], Q) with a finite-dimensional function space, and approximating the integral with a numerical quadrature formula. When the configuration manifold Q is a linear space and the polynomials of degree less than or equal to s are chosen, the classical symplectic partitioned Runge-Kutta methods are recovered. Subsequently, Leok and Shingel [12] introduced the shooting-based discrete Lagrangian, which allows one to construct a symplectic integrator from an arbitrary one-step method. When the configuration manifold Q is a Lie group G, the construction of the discrete Lagrangian is more complicated than the case of linear space. Leok [11] proposed parametrizing curves on the Lie group using the exponential map, namely a curve g(t) connecting g 0 and g 1 that is represented by g(t) = g 0 · exp( (t)), where (t) ∈ g is a curve on the Lie algebra of G with fixed endpoints (0) = 0 and (h) = log(g −1 0 g 1 ). This allows one to replace variations in g(t) by variations in (t) on the Lie algebra g, which is a linear space. This yields the following expression for the exact discrete Lagrangian, L exact d (g 0 , g 1 ) = ext ∈C 2 ([0,h],g) (0)=0, (h)=log(g −1 0 g1) h 0 L(g 0 · exp( (t)), g 0 · d exp (t) (˙ (t))) dt, where d exp (˙ ) = exp( ) · 1−e −ad ad (˙ ) is the tangent lift of the exponential map. If (t) is restricted to a finite-dimensional function space and the integral is replaced with a quadrature rule, we obtain the Galerkin Lie group variational integrators. The error analysis and implementation details for such methods can be found in [3, 8] . The above construction can be naturally extended to any retraction [1] on G, which is a diffeomorphism from a neighborhood of 0 ∈ g to neighborhood of e ∈ G that satisfies a rigidity condition. The main disadvantage of Galerkin Lie group variational integrators is the term d exp in (3) . This implies that the resulting discrete Euler-Lagrange equations involve d 2 exp, which cannot be calculated exactly in general and requires the truncation of a series expansion. Contributions. In this paper, we focus on the Lie group SO(n) as our configuration space. By using the fact that every invertible square matrix can be uniquely decomposed into the product of an orthogonal matrix and a symmetric positive-definite matrix via the polar decomposition, we will circumvent the disadvantage discussed previously: Instead of parametrizing curves on SO(n) by the exponential map or a retraction, SO(n) is embedded naturally in the space GL + (n) = {A ∈ R n×n | det(A) > 0}, an open subset of R n×n . Given fixed endpoints g 0 and g 1 , we will construct interpolating polynomials in GL + (n) while ensuring that the internal points remain on SO(n) by using the polar decomposition. Furthermore, we do not extend the Lagrangian L to GL + (n) but instead project the trajectory onto SO(n) in the same way. The variational integrator in Lagrangian form is derived following the usual variational approach for the constrained Galerkin method and the Hamiltonian form is derived using the discrete Legendre transforms. For a system with rotational symmetry, we obtain a simpler integrator using Lie-Poisson reduction on the Hamiltonian side. Namely, if L is SO(n)-invariant, the constructed discrete Lagrangian is also SO(n)invariant and we can construct a reduced symplectic Lie-Poisson integrator. Lastly, we consider the dipole on a stick problem from [3] , conduct numerical experiments using our method, and compare these to the variational Runge-Kutta-Munthe-Kaas methods (VRKMK) from the same reference. 2. Background 2.1. Notation. Recall that the Lie algebra of SO(n) is the set Skew(n) = {Ω ∈ R n×n | Ω T = −Ω}, with the matrix commutator as the Lie bracket. Here, the inner products on R n×n and Skew(n) are introduced, and we identify these spaces with their dual spaces using the Riesz representations. For any A, B ∈ R n×n , the inner product is given by For any Ω,Ω ∈ Skew(n), the inner product is defined by In addition, consider the operator Asym : R n×n → Skew(n), defined by Asym(A) = A − A T . The following properties can be easily verified: In particular, we note that (c) gives a relationship between the two inner products. Lastly, given the choice of inner products, Riesz representation allows us to identify (R n×n ) * with R n×n and Skew(n) * with Skew(n). We introduce the polar decomposition and the construction of the retraction on SO(n) described in [4] . Given A ∈ GL(n), it decomposes uniquely as where Sym + (n) is the set of n × n symmetric positive-definite matrices. This is the polar decomposition of A, and we denote it as a coproduct mapping by pol : GL(n) → O(n) × Sym + (n). The map of interest is the projection P = π 1 • pol : GL(n) → O(n) defined by where π 1 is the projection onto the first coordinate. In particular, when A ∈ GL + (n), we have U ∈ SO(n). A fast and efficient algorithm for calculating the projection of the polar decomposition is by Newton iteration, Now, this projection can be used to construct a retraction on SO(n) from its Lie algebra Skew(n) relative to the identity element I, where Ω ∈ Skew(n). This provides a diffeomorphism between a neighborhood of 0 ∈ Skew(n) and a neighborhood of I ∈ SO(n). To calculate its inverse, suppose that I + Ω = U P and take the transpose on both sides to obtain I − Ω = P U T , which implies that U T (I + Ω) = (I − Ω)U . Thus, we have that This is a Lyapunov equation, and it is well-known that matrix equations of the form AX + XB + C = 0 have a unique solution if and only if for any λ ∈ σ(A), µ ∈ σ(B), λ + µ = 0. For U in the neighborhood of identity, its eigenvalues lie in the open right-half plane, which ensures that a unique solution to (5) exists. In principle, this Lyapunov equation can be solved using classical algorithms [2, 7] . For convenience, we denote the solution to the Lyapunov equation as X = Lyap(A, B, C). Next we introduce the tangent map and its adjoint for P, which are essential for the derivation of the variational integrator. 2.2.1. The Tangent Map. Consider the polar decomposition A(t) = U (t)P (t) and differentiate both sides to yieldȦ =U P +UṖ . By left-trivialization on SO(n), we can writeU = U Ω, where Ω ∈ Skew(n). Rearranging givesṖ = U TȦ − ΩP , and sinceṖ ∈ Sym + (n), we get that U TȦ − ΩP =Ȧ T U + P Ω. Consequently, we may write it in the form of a Lyapunov equation, We see that the tangent map of the polar decomposition dP A : R n×n → Skew(n) is given by dP A (Ȧ) = Ω, where we solve the Lyapunov equation (6) to obtain Ω. The Adjoint of the Tangent Map. The adjoint of dP A can be defined as Recall that dP A (B) involves solving the Lyapunov equation (6) . We aim to compute dP * A , so we define the following two maps, φ : Skew(n) −→ Skew(n), Ω −→ ΩP + P Ω, where A = U P is fixed. Therefore, dP A can be viewed as composition of ψ and φ −1 , and dP * A (Ω) = (φ −1 • ψ) * (Ω) = ψ * • (φ * ) −1 (Ω). We shall derive the expressions for φ * and ψ * by considering the Riesz representations for our domains and codomains. For the adjoint of φ, let Ω,Ω ∈ Skew(n), then Thus, φ * = φ, and φ is Hermitian. For ψ * , let Ω ∈ Skew(n) and B ∈ R n×n , then Therefore, ψ * (Ω) = U Ω, and we obtain where Lyap(P, Ω T ) = Lyap(P, P, Ω T ). Finally, we state a lemma that will be used later: Let the configuration manifold be the rotation group G = SO(n), and L : G × g → R be a left-trivialized Lagrangian. We shall construct a discrete Lagrangian following the approach of constrained Galerkin methods on GL + (n) (see Appendix A). Denote the internal points by {U i } s i=1 ⊂ G and the left-trivialized internal tangent vectors by {Ω i } s i=1 ⊂ g. Fixing the endpoints g 0 and g 1 , we have subject to the constraint where the internal points U i are represented by The expressions inside the parentheses in (7) and (8) correspond to a Runge-Kutta method in the embedding space. But, since these points may not lie on the Lie group G, they are projected to G using the polar decomposition. Observe that (7) is equivalent to the condition that P g T Suppose that h is small, and g 0 and g 1 are close enough to each other, then g T , and so it is equivalent to Now, we can construct a discrete Lagrangian with the constraint using a Lagrange multiplier Λ ∈ Skew(n), The corresponding variational integrator is given by We shall compute the above equations more explicitly. It is easy to see that (9b) is equivalent to the constraint (7). Let us turn to (9a), where the main difficulty is the implicit dependence of that involves solving a nonlinear system (8) . Suppose that k ∈ {1, 2, . . . , s} is fixed, and we vary Ω k such Differentiating both sides and lettingU i = U i X ik , we have that where Then, (10) can be rewritten as In order to represent {X ik } s i=1 in terms of δΩ k , we define three maps, Now, we compute ∂F ∂Ω k by evaluating d dt t=0F (g 0 , g 1 , . . . , Ω k (t), . . . , Λ) and expressing ∂L ∂U : G × g → g * as a left-trivialized cotangent vector. Since this is a straightforward calculation of the variation, we present the result here for equation (9a) (see section C.1 for details), for any k = 1, 2, . . . , s. Recall thatπ i •φ −1 •ψ k : Skew(n) → Skew(n) and its dual is given by Therefore, let us derive the explicit expressions for the adjoints of our three proposed maps, so we may write (π i •φ −1 •ψ k ) * explicitly. The adjoint ofπ i is easy to compute, and for any S ∈ Skew(n), where S is in the i-th position. For the adjoint ofφ, we consider the identity φ * (S 1 , . . . , S s ), (S 1 , . . . ,S s ) = (S 1 , . . . , S s ),φ(S 1 , . . . ,S s ) , for any (S 1 , . . . , S s ), (S 1 , . . . ,S s ) ∈ Skew(n) s . Using the properties of the inner products again, we obtain the explicit expression (see section C.2 for details), Similarly, we consider the same identity and techniques to obtain the explicit expression forψ * k (see section C.3 for details),ψ * Combining (14), (15) , and (16), (π i •φ −1 •ψ k ) * (S) for S ∈ Skew(n) can be computed as follows, We can first calculate {S l } s l=1 from (17a) by using fixed-point iteration, and then substitute the result into (17b) to obtain (π i •φ −1 •ψ k ) * (S). This combined with (13) gives an explicit formula for (9a). We now derive an explicit formula for (9d). Notice that U i depends on g 0 by the nonlinear system in (8), so we can use the method of variations again. If we vary g 0 → g 0 (t) such that g 0 (0) = g 0 andġ 0 (0) = g 0 δg 0 , we obtain which can be rewritten as Similar to the approach used for X ik , we introduce a new map, . The explicit expression forφ * can be written as (see section C.4 for details), As such, (π i •φ −1 •φ) * (S) can be computed as follows, Now, we compute ∂F ∂g0 , which gives us (9d) explicitly (see section C.5 for details), For equation (9e), it is easy to show that Combining (13), (7), (8) , (20), and (21), we obtain a Lagrangian variational integrator on SO(n), , The integrator gives an update map (g 0 , p 0 ) → (g 1 , p 1 ) on the cotangent bundle. In particular, one may solve It is often desirable to transform a numerical method from the Lagrangian side to the Hamiltonian side, which is possible if L is hyperregular. The same mechanical system can be represented by either a Lagrangian or a Hamiltonian, and they are related by the Legendre transform. In Euclidean space, this gives Given a Lie group G, a left-trivialized Lagrangian L : G × g → R, and its corresponding Hamiltonian H : G × g * → R, it is easy to verify that similar relationships hold for these trivializations, Using (23) and denoting the corresponding internal cotangent vectors by {µ k } s k=1 , (22) can be transformed to the Hamiltonian form as In the above algorithm, Ω i is given explicitly by (24f) and only serves to reduce the redundancy in the computations because they are used numerous times in the other expressions. Similarly, (π i •φ −1 ) * shows up in both (24a) and (24d), so one can save computational effort by reusing the shared solution to (17a) and We consider a G-invariant Hamiltonian system given by H on the cotangent bundle T * G. In this case, Hamilton's equations can be reduced to a Lie-Poisson system on g * . If the Hamiltonian is hyperregular, then both the Lagrangian and the corresponding constrained Galerkin discrete Lagrangian L d (g 0 , g 1 ) will be G-invariant. As such, (2) naturally reduces to yield a Lie-Poisson integrator (see Appendix B). We only consider the reduction on the Hamiltonian side due to the nature of the constrained Galerkin methods, which give an update map on the cotangent bundle. The discrete Lagrangian we have constructed becomes and l : g → R is the reduced Lagrangian. It is easy to verify that our system is G-invariant, meaning Therefore, the variational integrator (24) can theoretically be reduced to a Lie-Poisson integrator. By letting g T 0 g 1 = f 0 and U T i g 1 = Θ i , (24) simplifies to where h is the reduced Hamiltonian. Multiplying by g T 1 on both sides of Suppose that h is small and g 0 and g 1 are close, then i Ω i is in the neighborhood of I. By Lemma 1, this is equivalent to which can be regarded as the fixed-point equation for f 0 . The first four equations can be solved using fixed-point iteration for the variables ({µ k } s k=1 , f 0 , {Θ i } s i=1 , Λ) as in our previous discussions. Then, p 1 can be calculated explicitly. In the above algorithm, we also need to figure out the reduced version of (π i •φ −1 •ψ k ) * and (π i •φ −1 •φ) * . Note that (17a) and (17b) involve U T j dP * Ai ; in addition, g 0 , g 1 , U i , and A i = g 0 + h s j=1 a ij U j Ω j = U i P i are reduced, so we need a reduced version of U T j dP * Ai as well. Multiplying A i on the left by g T 1 , we obtain Then, (g T 1 U i )P i is the polar decomposition of f T 0 + h s j=1 a ij Θ T j Ω j , and for S ∈ Skew(n), This is the reduced version of U T j dP * Ai (S), and so (π i •φ −1 •ψ k ) * (S) can be computed as follows, we need to compute g T 0 dP * Ai (S) for some S ∈ Skew(n), which is given by Hence, we have We test our methods and compare them to the variational Runge-Kutta-Munthe-Kaas (VRKMK) methods from Bogfjellmo and Marthinsen [3] on the dipole on a stick problem that they considered. In particular, our configuration space is SO(3), and its Lie algebra is identified with R 3 . We shall only recall the mathematical expressions here, so for a thorough description of the system one should refer to the reference above. The right-trivialized and left-trivialized Hamiltonians, H R , H L : SO(3) × R 3 → R, can be written as where U (g) = me T 3 ge 3 + qβ gy 0 . Note that J = m diag(1 + α 2 , 1, α 2 ), with m = 1 and α = 0.1. The constant vectors are y 0 ± = (0, ±α, −1) T and z = (0, 0, −3/2) T . Lastly, q = β = 1, and · 2 is our usual Euclidean norm. Both forms of the Hamiltonian are written here because while our method was developed for left-trivialized systems using Hamilton's principle, their method was developed for right-trivialized systems using the Hamilton-Pontryagin principle. As a result, both discretizations yield symplectic variational integrators for the Hamiltonian with their corresponding choice of trivialization. In particular, we havẽ as a relationship between the dual elements of the corresponding trivialization: p is the dual representation of ξ ∈ g for the right-trivializationġ = ξg, andp is the dual representation ofξ ∈ g for the left-trivializatioṅ g = gξ. Consequently, we note that the left-trivialized cotangent vector ∂H ∂g in VPD (24) is computed as ∂H ∂g (g,p) = Asym g −1 ∇ g H L (g,p) , where ∇ g H L (g,p) is the matrix derivative of H L (g,p) with respect to g. On the other hand, the righttrivialized cotangent vector is computed as when implementing the VRKMK method. For our tests, we also have the same initial data from [3] , for the VRKMK methods, and so (30) givesp(0) to complete the initial data for the VPD methods. Since both methods involve fixed-point iteration, we terminate the processes when the norm · 2 between the current and previous iteration is less than 10 −15 for each variable. In particular, the norm for vectors is the Euclidean norm and for matrices it is the induced matrix norm. Lastly, we ran these implementations in Wolfram Mathematica 12 on a personal computer with the following specifications: Operating system: Windows 10; CPU: 3.7GHz AMD Ryzen 5 5600X; RAM: G.Skill Trident Z Neo Series 32Gb DDR4 3600 C16; Storage: 1TB Rocket Nvme PCIe 4.0 writing/reading up to 5000/4400 MB/s. We run both methods based on the one-, two-, and three-stage Gauss-Legendre methods and a third-order Runge-Kutta method for comparisons, and these methods are shown as Butcher tableaux in Table 1 . We compute the errors in (g(0.5), p(0.5)) from VRKMK and errors in (g(0.5),p(0.5)) from VPD with respect to a reference solution and these errors are shown in Figure 1 . The reference solution was calculated using the sixth-order VPD method with step size h = 0.001. The errors for VRKMK is computed as Step size Step size Error VRKMK 6th VPD 6th The black-dashed lines are reference lines for the appropriate orders: In the sixth order comparison plot, the errors from the fixed-point iteration are dominating for smaller step sizes, so the theoretical order is not observed in those regimes. Otherwise, both methods achieve their theoretical orders, and they are quite comparable, noting that VPD exhibits a noticeable, smaller error constant in the second order method. Long-Term Behaviors. The long-term energy behaviors for both methods are presented in Figures 2 -5 . For second, third, and fourth order, we fixed the step size h = 0.01 and ran 10 5 integration steps to observe the energy errors. For second order, the energy errors have magnitude orders of 10 −4 for VRKMK and 10 −5 for VPD; for third order, they have magnitude orders of 10 −7 for VRKMK and 10 −6 for VPD; for fourth order, they have magnitude orders of 10 −9 for both methods. For the sixth order, we consider step size h = 1/26 to avoid the regime where the errors in the fixedpoint iteration are dominating. This step size corresponds to the third point from the right in the order comparison plots. We also run 10 5 integration steps and observe the energy errors in Figure 5 . The energy error for the VPD method is stable with an order of magnitude of 10 −10 . On the other hand, the energy error in VRKMK exhibits a slow increase over the integrating time span. We investigated this phenomenon and attribute this to the framework of each method. In VPD, g 1 in (24b) and the internal points U i in (24c) are updated in each integration step via polar decomposition. As a result, the Lie group structure is always preserved up to machine precision for both the configuration space elements and internal points. However, in VRKMK, both g 1 and the internal points U i are updated by the left action of SO(n). This left action is a matrix multiplication which is performed to machine precision, but with a sufficient number of multiplications, the round-off error will still accumulate. Consequently, the Lie group structure is not as Sixth order VPD method Step Size Average Runtime (sec) VRKMK 6th VPD 6th well-preserved in comparison, and this is illustrated in the comparison of orthogonality errors gg T − I 3 2 in Figure 6 . We also observed that projecting the sixth-order VRKMK method onto the rotation group using the polar decomposition at each timestep does not recover the near energy preservation typical of symplectic methods. Presumably, this is because the projection subtly compromises the symplecticity of the method. This is consistent with the observation made in [10] , that both the Lie group structure and symplecticity needs to be preserved for the methods to exhibit near energy preservation. We also have data for runtime comparison of VRKMK and VPD for the twostage and three-stage Gauss-Legendre methods in Figure 7 . For each step size h, we recorded the runtime of each method in seconds and repeated the execution 256 times to compute the average runtime. The runtime for VPD methods is on average longer than for VRKMK methods. This may be due in parts to the Lyapunov solutions and multiple layers of fixed-point iterations required in the polar decomposition in (4) and (π i •φ −1 ) * in the same equations (17a) and (19a). By applying the polar decomposition in a constrained Galerkin method, we constructed high-order Lie group symplectic integrators on SO(n), which we refer to as the variational polar decomposition method. Furthermore, the integrator can be reduced to a Lie-Poisson integrator when the underlying mechanical system is group-invariant. We performed extensive numerical experiments comparing our proposed method to the variational Runge-Kutta-Munthe-Kaas method from [3] . These comparisons provide insights into each method and highlight an advantage of our proposed method, which is the preservation of Lie group structure up to machine precision for both the configurations and internal points. This appears to be important for high-order methods to exhibit the near energy preservation that one expects for symplectic integrators when applied to dynamics on Lie groups. For future work, it would be interesting to explore the generalization of the proposed method to symmetric spaces, by applying the generalized polar decomposition [15] . This may be of particular interest in the construction of accelerated optimization algorithms on symmetric spaces, following the use of time-adaptive variational integrators for symplectic optimization [6] based on the discretization of Bregman Lagrangian and Hamiltonian flows [5, 16] . Examples of symmetric spaces include the space of Lorentzian metrics, the space of symmetric positive-definite matrices, and the Grassmannian. Our Galerkin variational integrator will involve a discrete Lagrangian that differs from the classical construction in [14] . Traditionally in the linear space setting, (1) is approximated with a quadrature rule and q(t) is approximated by polynomials with degree less than or equal to s with fixed endpoints q 0 and q 1 . Consider the quadrature approximation of the action integral, viewing it as a function of the endpoint and interpolation values, where q(t) = s ν=0 q ν 0 φ ν ( t h ). Then, a variational integrator (2) can be obtained as follows, However, (31) is often impractical due to the complexity of evaluating q(c i h) andq(c i h), which involve the Lagrange interpolating polynomials. In addition, computing the root of a system of nonlinear equations in (31) can be challenging because the formulation of a fixed-point problem could be complicated, and convergence issues could arise. In contrast, Runge-Kutta methods are already in fixed-point formulation and are convergent as long as consistency is satisfied. Now, note that the finite-dimensional function space M s does not depend on the choice of nodes {d ν 0 } s−1 ν=1 , and by a tricky elimination of φ ν (t), (31) can be simplified to yield . When transformed to the Hamiltonian side, (32) simply recovers the symplectic partitioned Runge-Kutta method. The same variational integrator can be derived in a much simpler way: Instead of performing variations on internal points {q ν 0 } s−1 ν=1 , we will perform variations on the internal derivatives {Q} s i=1 , subject to the constraint q 1 = q 0 +h s i=1 b iQi . Then, the internal points are reconstructed using the fundamental theorem of calculus and the degree of precision of the quadrature rule to obtain Q i = q 0 +h s j=1 a ijQj . Now, consider the quadrature approximation of the action integral, viewed as a function of the endpoint values and the internal velocities,F where λ is a Lagrange multiplier that enforces the constraint. Then, a variational integrator (2) can be obtained as follows, Explicitly expanding (33) and eliminating the Lagrange multiplier yields (32) in a much more straightforward manner. This same technique, known as the constrained Galerkin method, is adopted on the rotation group SO(n) in order to directly obtain a variational integrator in fixed-point form. Then, we continue using equation (12) to obtain, d dt t=0F (g 0 , g 1 , . . . , Ω k (t), . . . , Λ) = h We finally have . This gives us equation (15) . C.3. Explicit Expression forψ * k : A Derivation. Consider (S 1 , . . . , S s ), ∈ Skew(n) s andS ∈ Skew(n), then ψ * k (S 1 , . . . , S s ),S = (S 1 , . . . , S s ),ψ k (S) = (S 1 , . . . , S s ), dP Ai (a ik U kS ) a ik dP * Ai (S i ) ,S . This gives us equation (16) . C.4. Explicit Expression forφ * : A Derivation. Let us deriveφ * by considering (S 1 , . . . , S s ) ∈ Skew(n) s andS ∈ Skew(n), and so φ * (S 1 , . . . , S s ),S = (S 1 , . . . , S s ), This gives us equation (18). C.5. Derivation of ∂F ∂g0 . We compute Thus, we have Optimization Algorithms on Matrix Manifolds Solution of the matrix equation ax + xb = c [f4 High-order symplectic partitioned Lie group methods A class of intrinsic schemes for orthogonal integration A variational formulation of accelerated optimization on Riemannian manifolds Adaptive Hamiltonian variational integrators and applications to symplectic accelerated optimization A Hessenberg-Schur method for the problem ax + xb = c Lie group spectral variational integrators Discrete Euler-Poincaré and Lie-Poisson equations Lie group variational integrators for the full body problem in orbital mechanics Generalized Galerkin variational integrators General techniques for constructing variational integrators. Frontiers of Mathematics in China Introduction to Mechanics and Symmetry Discrete mechanics and variational integrators Generalized polar decompositions on lie groups with involutive automorphisms A variational perspective on accelerated methods in optimization This research has been supported in part by NSF under grants DMS-1010687, CMMI-1029445, DMS-1065972, CMMI-1334759, DMS-1411792, DMS-1345013, DMS-1813635, CCF-2112665, by AFOSR under grant FA9550-18-1-0288, and by the DoD under grant HQ00342010023 (Newton Award for Transformative Ideas during the COVID-19 Pandemic). The authors would also like to thank Geir Bogfjellmo and Håkon Marthinsen for sharing the code for their VRKMR method from [3] , which we used in the numerical comparisons with our method. Appendix A. Constrained Galerkin Methods When the Lagrangian L or Hamiltonian H is left-invariant, the mechanical system can be symmetry reduced to evolve on the Lie algebra g or its dual g * , respectively, assuming some regularity. On the Lagrangian side, this corresponds to Euler-Poincaré reduction [13] , which is described by the Euler-Poincaré equations,The above is expressed in terms of the reduced Lagrangian l( ) = l(g −1ġ ) = L(g,ġ). As a result, this can be described in terms of a reduced variational principle δ b a l( (t)) dt = 0 with respect to constrained variations of form δ =η + [ , η], where η(t) is an arbitrary path in the Lie algebra g that vanishes at the endpoints, namely η(a) = η(b) = 0. The constraint on the variations δ are induced by the condition that = g −1ġ together with arbitrary variations δg that vanish at the endpoints.On the Hamiltonian side, this corresponds to Lie-Poisson reduction [13] . Recall that the Lie-Poisson structure on g * is given byand together with the reduced Hamiltonian h(µ), they gives the Lie-Poisson equations on g * ,If the discrete Lagrangian L d (g 0 , g 1 ) is also G-invariant, meaning L d (g ·g 0 , g ·g 1 ) = L d (g 0 , g 1 ) for some g ∈ G, then (2) can be reduced to a Lie-Poisson integrator [9] ,where l d (f 0 ) = L d (e, f 0 ). This algorithm preserves the coadjoint orbits and, hence, the Poisson structure on g * .Appendix C. Detailed Derivations for the Lagrangian Variational Integratorswhere we can use the properties of the inner products to express the last term as follows, h s i=1 b i Asym(U T i g 1 ΛΩ T i ), X ik + hb k Asym(U T k g 1 Λ), δΩ k .