key: cord-0552165-214a0bfq authors: Li, Guanrui; Tunchez, Alex; Loianno, Giuseppe title: Learning Model Predictive Control for Quadrotors date: 2022-02-15 journal: nan DOI: nan sha: 49e7263fff2f831103b3c1f3cbb3376a8a5dc29d doc_id: 552165 cord_uid: 214a0bfq Aerial robots can enhance their safe and agile navigation in complex and cluttered environments by efficiently exploiting the information collected during a given task. In this paper, we address the learning model predictive control problem for quadrotors. We design a learning receding--horizon nonlinear control strategy directly formulated on the system nonlinear manifold configuration space SO(3)xR^3. The proposed approach exploits past successful task iterations to improve the system performance over time while respecting system dynamics and actuator constraints. We further relax its computational complexity making it compatible with real-time quadrotor control requirements. We show the effectiveness of the proposed approach in learning a minimum time control task, respecting dynamics, actuators, and environment constraints. Several experiments in simulation and real-world set-up validate the proposed approach. Micro Aerial Vehicles (MAVs) such as quadrotors have become very popular platforms to help humans solving a wide range of time-sensitive problems in constrained outdoor and indoor environments including logistics, search and rescue for post-disaster response, and more recently during COVID-19 pandemic reconnaissance and monitoring. These timesensitive tasks would require robots to make fast decisions and agile maneuvers in uncertain, cluttered, and dynamic environments by intelligently exploiting the environment information to improve their performances over time. In this work, we investigate a Learning Model Predictive Control (LMPC) for quadrotors that exploits past successful task iterations to improve its task performance over time while respecting system dynamics and actuator constraints. Several works have investigated the use of MPC for quadrotors control with perception and actuator constraints [1] - [4] . Other works [5] , [6] improve MPC performance by refining the system dynamics in a data-driven fashion. Conversely, approaches such as [7] approximate the system dynamics directly using neural networks. However, these methods are quite computationally expensive since they rely on a sampling approach that is generally performed in parallel using Graphics Processing Units (GPUs). Iterative learning control techniques [8] have been successfully combined with MPC in a Batch Model Predictive Control (BMPC) approach to control chemical processes [9] and refine their performances over multiple task iterations. In [10] , [11] a learning MPC approach is proposed and applied to ground vehicles. In this approach, the vehicle collects the states and their corresponding costs, across multiple successful iterations of the same task. The vehicle learns from the collected data to explore new ways to decrease cost in the same task as long as it maintains the ability to reach a state that has already been demonstrated to be safe during previous iterations. The approach does not require a reference trajectory as in previously mentioned works; thus, it is especially versatile and useful during tasks where the desired trajectory is not known or difficult to compute due to the system complexity or parameter uncertainty. This is the typical case of drone racing competitions [12] - [15] that have recently inspired researchers to design autonomy algorithms with the goal to grant vehicles the ability to execute agile maneuvers with superior performances compared to human controlled vehicles. Therefore, inspired by [10] , [11] , we propose a LMPC approach for quadrotors. Common multirotor platforms including quadrotors evolve on the nonlinear manifold configuration space SO(3) × R 3 making the LMPC problem substantially different and more complex for these types of systems compared to ground vehicles. We address the challenges of building a safety set that includes members of the rotation group SO(3). Also, we consider an appropriate numerical integration approach for the group elements to ensure that the forward integration results adhere to the SO(3) structure once employed in the discrete MPC formulation. In addition, we carefully add several design considerations to make the approach compatible with realtime control requirements of small aerial robots and show its feasibility in a learning minimum time control problem. The contribution of this paper is threefold. First, we propose a LMPC for quadrotors. We tackle the challenges related to transition this class of MPC to multi-rotor aerial systems. The data collected across multiple iterations incorporates elements of the nonlinear manifold SO(3). Furthermore, this aspect also makes the numerical MPC integration substantially more complex since at each integration step it is necessary to guarantee that the obtained elements still lie into the rotation group. Second, we relax the computational complexity of proposed approach making it suitable for real-time quadrotor control applications. Finally, we show a specific instance of the proposed method in a learning minimum time control task. Several simulation and experimental results show the ability of our approach to successfully exploit collected data to improve system execution time performances. The paper is organized as follows. In Section II, we review the MPC and LMPC approaches. In Section III, we specifically address the challenges show how to design LMPC for quadrotors, whereas in Section IV, we consider a particular instance of this approach employed to solve a minimum time control task. Section V present our experimental results and Section VI concludes the paper. In this section, we provide a brief review of the MPC formulation and how it transforms into LMPC. MPC is a predictive control method which finds a sequence of system inputs, U = {u 0 , u 1 , · · · , u N −1 } with u k ∈ R nu , within a fixed time horizon of N steps. It optimizes a given objective function -with a running and terminal cost L(·, ·) and Q(·) respectively -while accounting for constraints and system dynamics min u0,u1,··· ,u N −1 where x k+1 = f (x k , u k ) represents the system dynamics and x k ∈ R nx is the system state. The optimization occurs with initial condition x 0 while respecting system dynamics f (x, u) and additional state and input constraints g (x, u). In this section, we review the LMPC, an iterative learning model predictive control method that can improve task performance over many trials [10] . The task is repeated at each iteration starting from the same initial state, x s . The task is considered complete when the system reaches a global terminal state, x F , without violating constraints. During each task iteration, the system records the state and cost and uses the recorded data to ensure control convergence to a local optimal solution. We denote state and cost recorded during the j th iteration as the j th closed loop trajectory. In the following, we first introduce preliminary concepts for LMPC such as safety set and unique terminal cost function. Then we will introduce the LMPC formulation. 1) Safety Set: The states and inputs for a trajectory at each j th iteration is defined as where T j is the time stamp when the task is completed at the j th iteration. The recorded data points from each iteration generate a sampled safety set where M j is a set of indexes that represents the iterations that successfully completed the task. In short, SS j is a set for the j th iteration which contains the recorded state trajectories from previous successfully completed tasks. 2) Cost Function: The cost-to-go for the state x j t at time t of the j th iteration in the safety set is defined as The cost-to-go for any state can be determined with eq. (4) which is needed to satisfy LMPC formulation as discussed in Section II-B.3. Furthermore, the optimal cost of a given state is obtained from the minimum cost-to-go across all previous successful iterations Eq. (5) assigns every state the respective minimum cost-to-go across all iterations. 3) LMPC Formulation: The LMPC builds upon the general MPC formulation as described in Section II-A by appending a safety set constraint for the terminal state and assigning eq. (5) as the terminal cost The optimal control problem in eq. (6) computes a solution for the j th iteration at a given time stamp t of the task, over a finite horizon N . The eqs. (6b), (6c), (6e) define the system dynamics, state and input constraints, and initial condition of the system, respectively. The safety set constraint in eq. (6d) forces the terminal state x t+N |t to visit a discrete safety set state in SS j−1 . In principle, this guarantees a closed loop solution which pushes the system to a final state x F . In this section, we address the challenges related to designing LMPC for quadrotors. These systems evolve on complex nonlinear manifold configuration spaces SO(3) × R 3 thus making the formulation and solution of the LMPC substantially more complex than ground vehicles studied in [10] . This induces the need of appropriate numerical integration techniques of the system dynamics preserving the structure of the configuration space over time. Furthermore, the controller should run at an acceptable rate for realtime control of the quadrotor. The discrete property of the safety set, SS j , makes the LMPC a Nonlinear Mixed Integer Programming (NMIP) problem due to eq. (6d). This formulation is computationally expensive and incompatible with real-time applications. Hence in the following, we propose a convex approximation approach in Section III-C.1 in order to transfer the safety set constraints to a linear constraint and speed up the optimization. In addition, the size of the safety set SS j is n x × ( j i=0f T i ) and depends on the size of a state n x , sample frequencyf , and time T i of completion for successful i th iteration. Based on the formula, we see that the size of safety set can grow very quickly across iterations. Therefore, it is beneficial to create a sparser safety set for computational reasons. We propose to select subset of SS j as a local safety set to reduce the computational burden. Finally, we consider the limitation of the hardware platform. We incorporate in our formulation actuator constraints to guarantee the task feasibility as well as safety of the quadrotor platform. The system overview is presented in Fig. 2 . The relevant variables used in our paper are stated in Table I . We use two different coordinate frames with axes {e I x , e I y , e I z } and {e B x , e B y , e B z }, to denote the inertial frame and the quadrotor's body frame, respectively. The relevant variables for our model are defined in Table I . We consider x = [x Q ,ẋ Q , q ] and u = [f, Ω ] the quadrotor state and input vectors, respectively. The state vector includes the quaternion q = [q w , q x , q y , q z ] to describe the orientation of the quadrotor in the I frame. We employ this orientation parameterization instead of a rotation matrix so that we can reduce our state space size and consequently speed up our LMPC as mentioned in Section II. The system dynamics are d dt where Λ (Ω) is a skew-symmetric matrix of the quadrotor angular velocity [16] with Ω = [Ω x , Ω y , Ω z ] . I: Notation table. I, C, B inertial, camera, robot frame m Q ∈ R mass of robot x Q ,ẋ Q ,ẍ Q ∈ R 3 position, velocity, acceleration of robot in I R ∈ SO (3) rotation matrix of robot with respect to I Ω ∈ R 3 angular velocity of robot in B f ∈ R total thrust g ∈ R gravity constant Fig. 2 : System convention with I and B denoting inertial and body frames respectively. The total thrust is the lift generated by each of the rotors f = f 1 +f 2 +f 3 +f 4 along e B z . Subsequently, it is necessary to apply a rotation induced by q onto e I z by using q e I z = Re I z . The eqs. (7)-(8) represent the continuous system dynamics of a quadrotor which can be written asẋ = f (x, u). In order to incorporate them in the discrete time LMPC formulation, we apply 4 th order Runge-Kutta method to numerically integrateẋ over the sampling time dt given the state x k and input u k as x The key challenge here is the numerical integration and discretization techniques over SO(3) for eq. (8). Techniques like Runge-Kutta method do not necessarily adhere to the SO(3) structure [17] . As a result, the quaternion's norm can become non-unit because of round off error from numerical integration at each time step, which will also induce nonorthogonal column vectors in corresponding rotation matrix. Extra efforts can be made to circumvent numerical drift by applying a unit constraint. It will add extra constraints on the optimization problem causing additional computational complexity. Another way is to normalize the quaternion after each integration. However, this cannot be done over a horizon in MPC because the result of any intermediate step within the Runge-Kutta calculation is not guaranteed that the unit quaternion represents a SO(3) element. To bypass these strategies, we employ a non-unit quaternion in eq. (8) and its mapping to a rotation matrix is where Q is defined in [17] . The only remaining requirement is to avoid that the quaternion gets too close to the zero norm in eq. (11) . It can be shown that eq. (8), which maps angular velocity to the derivative of a unit quaternion, is a particular solution of the minimum-norm solution for the derivative of a non-unit quaternion [17] . Therefore, to avoid that the quaternion magnitude gets close to 0, we can add to eq. (8) any linear combination of vectors in the nullspace of the weighting matrix of the aforementioned minimum norm problem which turns out to be q and tune the coefficients. Alternatively, a one time re-scaling step can also be applied. C. LMPC Relaxation 1) Convex Safety Set: Inspsired by [11] , we approximate the safety set SS j by taking its convex hull CS j as where λ = λ 0 0 , λ 0 1 , · · · , λ 0 T 0 , ..., λ j 0 , λ j 1 , · · · , λ j T j , λ ≥ 0, λ 1 = 1. Eq. (12) represents the barycentric approximation of SS j . In this way, any state in the convex hull can be written as a convex combination of points in the safety set. Each element in λ corresponds to a positive weighted scalar value for each state in the convex hull. Similarly, an approximation of the terminal cost function can be derived. Therefore, we obtaiñ Using the convex hull of the safety set transforms the NMIP into a Quadratic Program with linear constraints defined by eq. (12) and eq. (13). It should be noted that for sake of simplicity, in eq. (12) we used the same notation to represent weighted average operations for both the state variables in the Euclidean space and the variables in the rotation group SO(3). However, in fact, for SO(3) elements, it cannot be achieved in the same way as for elements in the Euclidean space. The notion of Karcher mean [18] choosing L2-norm as metric among two rotation elements guarantees to find the convex set. However, the procedure does not have a closed-form solution and does not guarantee the existence of an unique mean [18] , thus making it difficult to incorporate it in the MPC. However in our task, the rotations are distributed around the identify element of SO(3) group and we can simply compute the convex safety set CS j xq for the rotation part by lifting all the sample in the tangent space since they lie in the same parameter subgroup [19] as where the exp and log operation is defined in [16] , x q refers to the quaternion vector as a subset of its corresponding state vector, and l indicates a selected state in CS j among P number of states. 2) Local Safety Set: In order to further reduce the computational load, we chose to reduce the size of safety set by choosing a subset, SS j t , of SS j at each time t K i = {k i 1 , · · · , k i n } is a set of time stamps for the corresponding states in the i th iteration, where i ∈ {p, ..., j} and p ∈ [0, j−1] is an integer that determines how many previous iterations to consider in the safety set. All the sets K i have an equal size of n. The time index in K i corresponds to the n-nearest neighbor states around the predicted terminal state from the predicted trajectory at t − 1. Fig. (3) illustrates how the n-nearest neighbors are selected for the local safety set. Furthermore, using eq. (12), we approximate the subset by taking the convex hull Eqs. (15)-(17) create a sparse convex hull with non-negative λ s elements. 3) LMPC Problem Formulation: Finally, the relaxed LMPC for quadrotor system can be fully defined as following Our system considers hardware limitations by constraining the control inputs where f min and f max are the maximum and minimum thrusts, whereas Ω min and Ω max are the maximum and minimum angular velocity, respectively. IV. LEARNING OPTIMAL TIME CONTROL In this section, we show a particular instance of the LMPC approach to learn optimal time trajectories. This approach can be leveraged for autonomous racing tasks by naturally discovering the minimum lap time through reference free iterations. Each task begins with a quadrotor at x s which maneuvers to a predefined goal x G . A track is created by setting intermediate waypoints and corridors to the goal. An initial safety set, SS 0 , is created by flying steadily and suboptimally through the track. The LMPC formulation in eq. (6) is relaxed as discussed in Section III and the following cost function The running cost is defined by two main components. First, it includes a binary cost which depends on whether the quadrotor has reached the goal position. This cost represents the minimum time control. Second, there is a penalty applied on the inputs to minimize the control effort with R u as constant diagonal matrix to tune the penalty on the control. It is important to note that lower values in R u favor a minimum-time solution, but this may yield aggressive control and with possible non-smooth control inputs. Therefore, there is a trade-off between these two objectives and tuning the weights is useful to achieving desired performances. While the binary cost can be implemented in simulation, it proves time consuming in real-time application. Thus, eq. (22) is approximated with a sigmoid function Furthermore, a track must be defined for the racing case. This can be done with corridors which are defined between two waypoints. A linear constraint is added to the LMPC formulation to guarantee the quadrotor stays within the track. The position of two distinct consecutive waypoints are defined as r w , r w+1 , respectively. The direction from the first waypoint to the quadrotor is defined as r i = x Q − r w . The normalized direction between each waypoint isr c = rw+1−rw rw+1−rw 2 . r n is the difference between r i and the projection of r i ontor c as r n = I −r cr c (x Q − r w ) . Physically, r n represents the quadrotor's distance to the center axis of the corridor in space. Therefore, a single corridor constraint applies bounds on r n −δ ≤ r n ≤ δ. Therefore, eq. (26) is a linear constraint which matches the form of eq. (18c) as V. EXPERIMENTS We propose several simulation and experiments with a quadrotor to validate the proposed LMPC during a learning minimum time trajectory task. We consider an optimal time control problem as specified in Section IV. The task is successfully accomplished if the quadrotor is able to pass through a given gate while staying within a given track and stop after passing it. Fig. 1 illustrates an example of the proposed task. This is obtained considering an L-shape track, which is constructed using the approach in Section. IV both in simulations and real-world experiments. We first provide the quadrotor a feasible and slow reference trajectory for the quadrotor MPC controller to track, like the one shown in the black color in the Figs. 4 and 6. During the trajectory tracking, the quadrotor records its state history to build the initial safety set. Subsequently, we run the LMPC controller and repeat the same process by sending the quadrotor to the same start position at the end of each task iteration. The only information provided to LMPC is the safety set and the corresponding cost-togo along the recorded safety set. The LMPC will then start to find the optimal time trajectories that minimizes the travel time while respecting the system dynamics, actuator constraints and track constraints. The attached multimedia material provides several additional experiments as well. We solve the proposed LMPC problem in eq. (18) with cost as eq. (22) via Sequential Quadratic Programming (SQP) using ACADOS [20] , [21] as a solver. For the LMPC controller parameters, we choose the prediction time horizon the discretized time step dt as 0.1 s, the horizon length N = 10 and corridor width δ = 0.8 m. B. Environments 1) Simulation: For simulation, we use a custom simulator available in the lab developed in ROS 1 with full system dynamics simulated using 4 th order Runge-Kutta method. 2) Real World: The real-world experiments are performed in an indoor testbed with a flying space of 10×6×4 m 3 at the ARPL lab at the New York University. We leverage a Vicon 2 motion capture system at 100 Hz for control purpose. The quadrotor platform setup is similar to our previous work [22] . The control and estimation frameworks are developed in ROS. The proposed LMPC method can run on-board at 100 Hz on a common laptop. We show the simulation and real-world test results of a quadrotor traveling through an L-shape track with the LMPC. The results of the simulation are shown in Figs. 4 and 5 whereas the real-world experiments in Figs. 6 and 7. The travel time for each iteration is reported in Table II . As we can observe from the plots, both in the simulation and realworld experiments, the quadrotor can explore the track and find a locally optimal time trajectory which ends up a much faster trajectory than the initial one. In Table II , we observe that as the iterations continue, the LMPC converges similar and stable travel time along the track. This proves that the LMPC can utilize the recorded states and costs in the past iteration and explore new faster trajectories for the quadrotor to accomplish the task. However, we notice that the final lap time varies around a given value after convergence. We believe this behavior is mostly due to a mismatch between real and modelled dynamics including our approximation to first order attitude dynamics in eq. (8). In this paper, we presented a LMPC for quadrotors. We addressed the challenges associated to the system evolution on a nonlinear manifold configuration space which requires careful considerations in the LMPC problem formulation as well as in its forward numerical time integration. We showed how to reduce its computational complexity such to make it compatible with the stringent requirements for real-time control of quadrotors as well as its usefulness in a learning minimum time trajectory problem. Future works will investigate the trade-offs between the incorporation of the second order dynamics in the proposed approach to improve the system performance, agility, and the corresponding increase in computational complexity which may affect the system real-time performances. We will leverage Bayesian machine learning techniques to refine the system dynamics incorporating unmodelled dynamical effects across multiple runs thus allowing to further push the system performances and agility limits. We will also consider to employ this method for drone racing tasks extending the proposed experiments to a full racing track. Finally, we will investigate the use of different cost functions and how the availability of reference trajectories can potentially be exploited to improve the task performances. PAMPC: Perceptionaware model predictive control for quadrotors Nonlinear model predictive control with enhanced actuator model for multirotor aerial vehicles with generic designs Motor and perception constrained nmpc for torque-controlled generic aerial vehicles PCMPC: Perception-Constrained Model Predictive Control for Quadrotors with Suspended Loads using a Single Camera and IMU Learning-based model predictive control on a quadrotor: Onboard implementation and experimental results Data-driven mpc for quadrotors Information theoretic mpc for model-based reinforcement learning A survey of iterative learning control Iterative learning control applied to batch processes: An overview Learning model predictive control for iterative tasks. a data-driven control framework Learning model predictive control for iterative tasks: A computationally efficient approach for linear system Challenges and implemented technologies used in autonomous drone racing Flightgoggles: Photorealistic sensor simulation for perception-driven robotics using photogrammetry and virtual reality AlphaPilot: Autonomous Drone Racing Fast, autonomous flight in gps-denied and cluttered environments Quaternion kinematics for the error-state kalman filter Integrating rotations using nonunit quaternions Rotation averaging Means and averaging in the group of rotations Towards a modular software package for embedded optimization acados-a modular open-source framework for fast embedded optimal control Estimation, control, and planning for aggressive flight with a small quadrotor with a single camera and imu