key: cord-0192148-cjctn6dg
authors: Beik-Mohammadi, Hadi; Hauberg, Soren; Arvanitidis, Georgios; Neumann, Gerhard; Rozo, Leonel
title: Learning Riemannian Manifolds for Geodesic Motion Skills
date: 2021-06-08
journal: nan
DOI: nan
sha: 5a9cd3442fcc6b5271629e6f6f4fd4efd04ba354
doc_id: 192148
cord_uid: cjctn6dg

For robots to work alongside humans and perform in unstructured environments, they must learn new motion skills and adapt them to unseen situations on the fly. This demands learning models that capture relevant motion patterns, while offering enough flexibility to adapt the encoded skills to new requirements, such as dynamic obstacle avoidance. We introduce a Riemannian manifold perspective on this problem, and propose to learn a Riemannian manifold from human demonstrations on which geodesics are natural motion skills. We realize this with a variational autoencoder (VAE) over the space of position and orientations of the robot end-effector. Geodesic motion skills let a robot plan movements from and to arbitrary points on the data manifold. They also provide a straightforward method to avoid obstacles by redefining the ambient metric in an online fashion. Moreover, geodesics naturally exploit the manifold resulting from multiple--mode tasks to design motions that were not explicitly demonstrated previously. We test our learning framework using a 7-DoF robotic manipulator, where the robot satisfactorily learns and reproduces realistic skills featuring elaborated motion patterns, avoids previously unseen obstacles, and generates novel movements in multiple-mode settings.

gerhard.neumann@kit.edu, leonel.rozo@de.bosch.com

Abstract-*For robots to work alongside humans and perform in unstructured environments, they must learn new motion skills and adapt them to unseen situations on the fly. This demands learning models that capture relevant motion patterns, while offering enough flexibility to adapt the encoded skills to new requirements, such as dynamic obstacle avoidance. We introduce a Riemannian manifold perspective on this problem, and propose to learn a Riemannian manifold from human demonstrations on which geodesics are natural motion skills. We realize this with a variational autoencoder (VAE) over the space of position and orientations of the robot end-effector. Geodesic motion skills let a robot plan movements from and to arbitrary points on the data manifold. They also provide a straightforward method to avoid obstacles by redefining the ambient metric in an online fashion. Moreover, geodesics naturally exploit the manifold resulting from multiple-mode tasks to design motions that were not explicitly demonstrated previously. We test our learning framework using a 7-DoF robotic manipulator, where the robot satisfactorily learns and reproduces realistic skills featuring elaborated motion patterns, avoids previously-unseen obstacles, and generates novel movements in multiple-mode settings.

Robot motion generation has been actively investigated during the last decades, where motion planners and movement primitives have led to significant advances. When a robot moves in obstacle-free environments, the motion generation problem can be easily solved by off-the-shelf motion planners [13] . However, the problem is significantly more involved in unstructured environments when (static and dynamic) obstacles occupy the robot workspace [27] . Moreover, if the robot motion depends on variable targets, or requires to consider multiple-solution tasks, the motion generation problem exacerbates. Some of the aforementioned problems have been recently addressed from a learning-from-demonstration (LfD) perspective, where a skill model is learned by extracting the relevant motion patterns from human examples [28] .

LfD approaches are advantageous as they do not necessarily require a model of the environment, and can easily adapt to variable targets on the fly [28] . Three main lines of work stand out in the LfD field, namely, (1) dynamical-system-based approaches [20] which focus on capturing motion dynamics, (2) probabilistic methods [6, 29, 19] which exploit data variability and model uncertainty, and more recently, (3) neu- ral networks [37, 4] which address generalization problems. Despite their significant contributions (see Section II), several challenges are still open: encoding and reproduction of fullpose end-effector movements, skill adaptation to unseen or dynamic obstacles, handling multiple-solution (a.k.a. multiplemode) tasks, generalization to unseen situations, among others.

In this paper we provide an LfD approach that addresses several of these problems, through a Riemannian perspective for learning robot motions from demonstrations. Unlike previous works [17, 26] , where skill manifolds are built from locally smooth manifold learning [11] , we leverage a Riemannian formulation. Specifically, we develop a variational autoencoder (VAE) that learns a Riemannian submanifold of R 3 × S 3 from human demonstrations. We exploit geodesics (i.e. shortest paths) on this learned manifold as the robot motion generation mechanism, from which full-pose end-effector trajectories are generated. These geodesics reproduce motions starting from arbitrary initial points on the manifold, and they can adapt to avoid dynamic obstacles in the robot environment by redefining the ambient metric (see Fig. 1 for illustration).

Our approach can learn manifolds encoding multiple-solution tasks, from which novel (unobserved) robot motions may naturally arise when computing geodesics. We illustrate our approach with a simple example, and we test our method in robotic experiments using a 7-DoF robotic manipulator. The experiments show how our approach learns and reproduces realistic robot skills featuring complex motion patterns and obstacle avoidance. Moreover, we demonstrate how our approach can discover new solutions for unseen task setups in a multiple-mode setting.

In summary, we contribute a new view on motion skill learning that navigates along geodesics on a manifold learned via a novel VAE over position-orientation space. We show how this allows the robot to generate useful movements beyond the demonstration set, while avoiding dynamic obstacles that were not part of the learning procedure.

We first briefly review some relevant work on learning from demonstrations, variational autoencoders (VAEs), and Riemannian geometry. We also introduce some recent connections between VAEs and Riemannian geometry, which is the backbone of our work.

Learning from demonstrations (LfD) provides a framework for a robot to quickly learn tasks by observing several demonstrations provided by a (human) expert [28] . The demonstrations are then used to learn a model of the task (e.g., a movement primitive, a reward, or plan) which is then used to synthesize new robot motions [32] . In particular, movement primitives (MPs) describe complex motion skills, and represent an alternative solution to classic motion planners [13] for generating robot motions. We here exploit LfD to learn a Riemannian skill manifold, which we later employ to drive the robot motion generation.

LfD approaches can be categorized into: (1) dynamicalsystem-based approaches [20] which capture the demonstrated motion dynamics [36] , (2) probabilistic methods [6, 29, 19] that take advantage of data variability and model uncertainty, and (3) neural networks [37, 4] aimed at generalization problems. Our method leverages a neural network (VAE) to learn a Riemannian metric that incorporates the network uncertainty. This metric allows us to generate motions that resemble the demonstrations. Unlike the aforementioned approaches, our method allows for online obstacle avoidance by rescaling the learned metric. Although obstacle avoidance might still be possible by defining via-points in methods like [29, 37, 19] , this problem was not explicitly considered in any of them.

Finally, human demonstrations may show different solution trajectories for the same task [33, 37] , which is often tackled through hierarchical approaches [24, 14] . In this context, our method permits to encode multiple-solution tasks into the learned Riemannian manifold, which is exploited to not only reproduce the demonstrated solutions but also to come up with a hybrid solution built on a synergy of a subset of the demonstrations. These novel solutions naturally arise from our geodesic motion generator. Note that previous learning frameworks generate robot motions that are restricted to the provided solutions for the task at hand.

A variational autoencoder (VAE) [23] is a generative model that captures the data density p(x) through a latent variable z that generally has a significantly lower dimension than x. In the interest of simplicity, we consider Gaussian VAEs corresponding to the generative model

Here µ θ : Z → X and σ θ : Z → R D + are deep neural networks with parameters θ that estimate the mean and the variance of the posterior distribution p θ (x|z).

When the generative process is nonlinear, exact inference becomes intractable, and VAEs apply a variational approximation of the evidence (marginal likelihood). The corresponding evidence lower bound (ELBO) is then

where q φ (z|x) = N (x|µ φ (x), I d σ 2 φ (x)) approximates the posterior distribution p(z|x) by two deep neural networks µ φ (x) : X → Z and σ φ (x)) : X → R d + . This approximate posterior is often denoted the inference or encoder distribution, while the generative process p θ (x|z) is known as the decoder.

As mentioned previously, we use VAEs to learn a robot skill.

Riemannian manifolds can be intuitively seen as curved ddimensional surfaces that are described by smoothly-varying positive-definite inner products, characterized by the Riemannian metric M [25] . These manifolds locally resemble a Euclidean space R d , and have a globally defined differential structure. For our purposes, it suffices to consider manifolds as defined by a mapping function

where both Z and X are open subsets of Euclidean spaces with dim Z < dim X . We then say that M = f (Z) is a manifold immersed in the ambient space X . Given a curve c : [0, 1] → Z, we can measure its length on the manifold as

By applying the chain-rule, we see that this can be equivalently expressed as

Hereċ t = ∂ t c t is the curve derivative and where we have introduced the Riemannian metric

where J f is the Jacobian of f that we evaluate at z ∈ Z. We may think of the metric as forming a local inner product in Z that inform us how to measure lengths locally. This construction relies on the Euclidean metric of X ; we will later extend this to also form a Riemannian metric. Having defined a notion of curve length (5), we can trivially define shortest paths, or geodesics, as curves of minimal length. Geodesics are the generalization of straight lines on the Euclidean space to Riemannian manifolds. They will serve as our motion generation mechanism as explained in Section IV. Note that geodesics have been recently used as solutions of trajectory optimizers for quadrotors control [35] .

To make the link between VAEs and Riemannian geometry [1] , we may write the generative process of a VAE (2) as

This is also known as the reparametrization trick [23] and is illustrated in Fig. 2 . We see that this is similar to the mapping function (4) that defined a manifold in the previous section. The difference being that now f θ is stochastic. We can rewrite this stochastic mapping as [12] f

where P is a random matrix, and g(z) is the concatenation of µ θ (z) and σ θ (z). In this notation, we can view the VAE as a random projection of a deterministic manifold spanned by g, and the metric under this mapping is

Geodesics under this metric have been shown to be faithful to the data used for training the VAE [1] . Hauberg [15] argues that this is due to the contribution from σ to the metric and that disregarding this term gives an almost flat manifold geometry. As mentioned, the definition of curve length relies on the Euclidean metric of the ambient space X , but this is not a strict requirement. Arvanitidis et al. [3] argue that there is value in giving the ambient space a manually defined Riemannian metric and including that into the definition of curve length. Curve length is then defined as

where M X is the ambient space metric, which can now vary smoothly across X . The corresponding Riemannian metric of Z is thenM

With this construction, it is straightforward to push geodesics away from certain regions of X by increasing M X there. Note that geodesics do generally not follow a closed-form expression in these models, and numerical approximations are in order. This can be done by direct minimization of curve length [38, 22] , A * search [9] , integration of the associated ODE [2] , or various heuristics [8] .

Learning complex robot motion skills requires models that have enough capacity to learn and synthesize the relevant patterns of a motion while being flexible enough to adapt to new conditions. In this section, we describe how we tackle this problem by bringing a Riemannian manifold perspective to the robot learning problem. First, we explain how we exploit VAEs to access a low-dimensional learned manifold of the motion data where an ambient-space Riemannian metric is learned. This metric will be later used to generate robot motion trajectories via geodesics, as detailed in Section IV. In order to learn elaborated motion skills, which may display complex position and orientation trajectories, we represent the robot state as the full pose of the robot end-effector, i.e. its position x ∈ R 3 and orientation q ∈ S 3 . We then seek a VAE that models a joint density of this state. We retain the usual Gaussian prior p(z) = N (z|0, I d ), but alter the generative distribution p θ,ψ (x, q|z) to suit our needs. We will assume that position and orientation are conditionally independent,

such that all correlations between the two must be captured by the latent variable z.

To model the conditional distribution of end-effector positions x, we opt for simplicity and choose this to be Gaussian,

where µ θ and σ θ are neural networks parametrized by θ.

One could argue that p θ (x|z) should have zero probability mass outside the workspace of the robot, but we disregard such concerns as σ 2 θ tends to take small values due to limited data noise. This implies that only a negligible fraction of the probability mass falls outside the robot workspace.

Complex robot motions often involve elaborated orientation trajectories which require a suitable representation for motion learning and control. There exist several orientation representation such rotation matrices, Euler angles, and unit quaternions. Euler angles and rotation matrices are commonly used for reasons of simplicity. Unfortunately, Euler angles suffer from gimbal lock [18] which makes them an inadequate representation of orientation in robotics, and rotation matrices are a redundant representation requiring a high number of parameters. Unit quaternions are a convenient way to represent an orientation since they are compact, not redundant, and prevent gimbal lock. Also, they provide strong stability guarantees in closed-loop orientation control [7] , they have been recently exploited in robot skills learning [34] , and for data-efficient robot control tuning [21] under a Riemannianmanifold formulation.

We choose to represent orientations q as a unit quaternion, such that q ∈ S 3 with the additional antipodal identification that q and −q correspond to the same orientation. Formally, a unit quaternion q lying on the surface of a 3sphere S 3 can be represented using a 4-dimensional unit vector q = [q w , q x , q y , q z ] ∈ S 3 , where the scalar q w and vector (q x , q y , q z ) represent the real and imaginary parts of the quaternion, respectively. To cope with antipodality, one could opt to model q as a point in a projective space, but for reasons of simplicity we let q live on the unit sphere S 3 . We then choose a generative distribution p ψ (q|z) such that

To construct a suitable distribution p ψ (q|z) over the unit sphere, we turn to the von Mises-Fischer (vMF) distribution, which is merely an isotropic Gaussian constrained to lie on the unit sphere [41] . This distribution is described by a mean direction µ with µ = 1, and a concentration parameter κ ≥ 0. Its density function takes the form

where C D is the normalization constant

with I D 2 −1 (κ) being the modified Bessel function of the first kind. Like the Gaussian, from which the distribution was constructed, the von Mises-Fischer distribution is unimodal. To build a distribution that is antipodal symmetric, i.e. p ψ (q|z) = p ψ (−q|z), we simply form a mixture of antipodal von Mises-Fischer distributions [16] ,

where µ and κ are parametrized as neural networks. This mixture model is conceptually similar to a Bingham distribution [41] , but is easier to implement numerically.

Our VAE model can be trained by maximizing the conventional evidence lower bound (ELBO) (3) , which now is

where x ∈ R 3 and q ∈ S 3 represent the position and quaternion of the end-effector, respectively. To balance the log-likelihood of position and orientation components, α > 0 and β > 0 are proportionally scaled. Note that due to the antipodal nature of quaternions, raw demonstration data may contain negative or positive values for the same orientation. So, we avoid any pre-processing step of the data by considering two von Mises-Fischer distributions that encode the same orientation at both sides of the hypersphere. Practically, we double the training data, by including q n and −q n for all observations q n .

Our generative process is parametrized by a set of neural networks. Specifically, µ θ and σ θ are position mean and variance neural networks parameterized by θ, while µ ψ and κ ψ are neural networks parameterized by ψ that represent the mean and concentration of the quaternion distribution. Following Sec. II-D the Jacobians of these functions govern the induced Riemannian metric as

with

are the Jacobian of functions representing the position mean and variance, as well as the quaternion mean and concentration, respectively.

In practice, we want this metric M (z) to take large values in regions with little or no data, so that geodesics avoid passing through them. Following Arvanitidis et al. [1] we achieve this by using radial basis function (RBF) networks as our variance representation, whose kernels reliably extrapolate over the whole space. Since one of the main differences between Gaussian and von Mises-Fischer distributions lies on the way they represent data dispersion, the RBF network should consider a reciprocal behavior when estimating variance for positions. In summary, the data uncertainty is encoded by the RBF networks representing σ −1 θ (z) and κ ψ (z), which affect the Riemannian metric through their corresponding Jacobians as in (21) .

As mentioned previously, geodesics follow the trend of the data, and they are here exploited to reconstruct motion skills that resemble human demonstrations. Moreover, we explain how new geodesic paths, that avoid obstacles on the fly, can be obtained by a metric scaling process. In particular, we exploit ambient space metrics defined as a function of the obstacles configuration to locally deform the original learned Riemannian metric. Last but not least, our approach can encode multiple-solution skills, from which new hybrid trajectories (not previously shown to the robot) can be synthesized. We elaborate on each of these features in the sequel.

Geodesic curves generally follow the trend of the training data, due to the role of uncertainty in the metric. Specifically, Eq. (10) tells us that geodesics are penalized for crossing through regions where the VAE predictive uncertainty grows. This implies that if a set of demonstrations follows a circular motion pattern, geodesics starting from arbitrary points on the learned manifold will also generate a circular motion (see Fig. 1 ). This behavior is due to the way that the metric M is defined, as M is characterized by low values where data uncertainty is low (and vice-versa). Since the geodesics minimize the energy of the curve between two points on M, which is calculated as a function of M , they tend to stay on the learned manifold and avoid outside regions. This property makes us suggest that geodesics form a natural motion generation mechanism. Note that when using a Euclidean metric (i.e., an identity matrix), geodesics correspond to straight lines. Such geodesics certainly neglect the data manifold geometry.

Formally, we compute geodesics on M by approximating them by cubic splines c ≈ ω λ (z c ), with z c = {z c0 , . . . , z c N }, where z cn ∈ Z is a vector defining a control point of the spline over the latent space Z. Given N control points, N − 1 cubic polynomials ω λi with coefficients λ i,0 , λ i,1 , λ i,2 , λ i, 3 have to be estimated to minimize its Riemannian length

Then, the final geodesic c computed in Z is used to generate the robot motion through the mean decoder networks µ θ and µ ψ . The resulting trajectory can be executed on the robot arm to reproduce the required skill. To illustrate, we consider a simple experiment where the demonstration data at each time point is confined to R 2 × S 2 , i.e. only two dimensional positions and orientations are considered. We create synthetic position data that follows a Jshape and orientation data that follows a C-shape projected on the sphere (see Fig. 3 ). We fit our VAE model to this data, and visualize the corresponding latent space in Fig. 4 . Here the top panel shows the latent mean embeddings of the training data with a background color corresponding to the predictive uncertainty. We see low uncertainty near the data, and high otherwise. The bottom panel of Fig. 4 shows the same embedding but with a background color proportional to log √ det M . This quantity, known as the magnification factor [5] , will generally take large values in regions where distances are large, implying that geodesics will try to avoid such regions. In the figure, we notice that the magnification factor is generally low, except on the 'boundary' of the data manifold, i.e. in regions where the predictive variance grows. Consequently, we observe that Riemannian geodesics (yellow curves in the figure) stay within the 'boundary' and are hence near the training data. In contrast, Euclidean geodesics (red curves in the figure) fail to stay in the data manifold. Our proposal is to use Riemannian geodesics to generate new motions for the robot.

Often human demonstrations do not include any notion of obstacles in the environment. As a result, obstacle avoidance is usually treated as an independent problem when generating robot motions in unstructured environments. A possible solution to integrate both problems is to provide obstacle-aware demonstrations, where the robot is explicitly taught how to avoid known obstacles. The main drawback here is that the robot is still unable to avoid unseen obstacles on the fly.

The Riemannian approach provides a natural solution to this problem. The natural metric in latent space (10) measures the length of a latent curve under the Euclidean space of the ambient space X . We can easily modify this to take obstacles into account. Intuitively, we can increase the length of curves that intersect obstacles, such that geodesics are less likely to go near the obstacles. Formally, we propose to alter the ambient metric of the end-effector position to be

Riemannian-metric-based Geodesic Euclidean-metric-based Geodesic Encoded Demonstrations where η > 0 scales the cost, o ∈ R 3 and r > 0 represent the position and radius of the obstacle, respectively. For the orientation component, we use M q X (x) = I 4 . Under this ambient metric, geodesics will generally avoid the object, though we emphasize this is only a soft constraint. This approach is similar in spirit to CHOMP [31] except our formulation works along a low-dimensional learned manifold, whose solution is then projected to the task space of the robot. Under this ambient metric, the associated Riemannian metric of the latent space Z becomes 

where M x X and M q X represent the position and orientation components of the obstacle-avoidance metric M X , respectively. Here we emphasize that as the object changes position, the VAE does not need to be re-trained as the change is only in the ambient metric.

Having phrased motion generation as the computation of geodesics, we evidently need a fast and robust algorithm for computing geodesics. As we work with low-dimensional latent spaces, we here propose to simply discretize the latent space on a regular grid and use a graph-based algorithm for computing shortest paths.

Specifically, we create a uniform grid over the latent space, and assign a weight to each edge in the graph corresponding to the Riemannian distance between neighboring nodes (see Fig. 5 ). Geodesics are then found using Dijkstra's algorithm [10] . This algorithm selects a set of graph nodes,

where g 0 and g N represent the start and the target of the geodesic in the graph, respectively. To select these points, the shortest path on the graph is calculated by minimising the accumulated weight (cost) of each edge connecting two nodes calculated as in (6) . To ensure a smooth trajectory, we fit a cubic spline ω λ to the resulting set G c by minimizing the mean square error. The spline computed in Z is finally used to generate the robot motion through the mean decoder networks µ θ and µ ψ . The resulting trajectory can be executed on the robot arm to reproduce the required skill.

One issue with this approach is that dynamic obstacles imply that geodesic distances between nodes may change dynamically. To avoid recomputing all edge weights every time an obstacle moves we do as follows. Since the learned manifold does not change, we can keep a decoded version of the latent graph in memory (Fig. 5 ). This way we need not query the decoders at run-time. We can then find points on the decoded graph that are near obstacles and rescale their weights to take the ambient metric into account. Once the obstacle moves, we can reset the weights of points that are now sufficiently far from the obstacle. The center panel of Fig. 9 provides an example showing how the metric on the left panel is represented as a discrete graph.

V. EXPERIMENTS We evaluate the performance and capabilities of our method in two different scenarios 1 : a simulated pouring task, and a real-world grasping scenario both in R 3 ×S 3 . In particular, the pouring task showcases a multiple-solution setting. For the experiments, we discuss the design of each experiment regarding manifold learning and geodesic computation. We also provide a visualization of the learned Riemannian metrics and geodesic representation in the latent space Z. Furthermore, the code is available at: https://sites.google.com/view/geodesicmotion.

We consider simulated and real robot demonstrations involving a 7-DoF Franka Emika Panda robot arm with a two-finger gripper. The demonstrations were recorded using kinesthetic teaching in the real grasping scenario meanwhile simulated pouring dataset was collected using the Franka ROS Interface [40] on Gazebo [39] . In both scenarios, the robot is controlled by an impedance controller.

We calculate geodesic on a 100 × 100 grid graph, and our straightforward Python implementation runs at 100Hz on ordinary PC hardware. The approach readily runs in real time.

Our VAE architecture is implemented using PyTorch [30] . The decoder and encoder networks have two hidden layers with 200 and 100 neuron units. The same architecture is used for all the experiments. The RBF variance/concentration networks use 500 kernels calculated by k-means over the training dataset [1] and predefined bandwidth. The latent space of the VAE is 2-dimensional, while the ambient space is 7dimensional, corresponding to R 3 × S 3 .

We employ a single neural network to represent both the position and orientation decoder means, meaning that our final metric is defined as

where J µ θ ∈ R (D X +D Q )×d is the Jacobian of the joint decoder mean network (position and quaternion), and J σ θ ∈ R D X ×d and J κ ψ ∈ R D Q ×d are the Jacobians of the decoder variance and concentration RBF networks. Since the position and quaternion share the same decoder mean network, the output vector is split into two parts, accordingly. The quaternion part of the decoder mean is projected to the S 3 to then define the corresponding von Mises-Fischer distribution (15) . The ELBO parameters α and β in (18) are found experimentally to guarantee good reconstruction of both position and quaternion data. It is worth pointing out that we manually provided antipodal quaternions during training, which leads to better latent space structures and reconstruction accuracy.

The first set of experiments is based on a dataset collected while a human operator performs kinesthetic demonstrations of a grasping skill. This particular grasping motion includes a 90°rotation when approaching the object for performing a side grasp [34] . The demonstration trajectories start from different end-effector poses, and they reach the same target position with slightly different orientations.

To reproduce the grasping skill, we computed a geodesic in Z which leads to a continuous trajectory in X , that closely reproduces the rotation pattern observed during demonstrations. Figure 6 depicts the magnification factor related to the learned manifold. The semi-transparent white points correspond to the latent representation of the training set, and the yellow curves are geodesics between points assigned from the start and endpoints of the demonstrations. The left panel in Fig. 6 shows geodesics in two different scenarios: The ones in the top cluster start from different poses and end up at the same target, and the geodesics in the bottom cluster start and end in different random poses. The target points on the most right side of each cluster represent the same position but due to their slightly different orientation, they are encoded on different latent points.

The results show that the method can successfully generate geodesics that respect the geometry of the manifold learned from demonstration. Interestingly, as shown by the magnification factor plot ( Fig. 6-left) , the resulting manifold is composed of two similar clusters, similarly to the illustrative example of Fig. 4 . We observed that this behavior emerged due to the antipodal encoding of the quaternions, where each cluster represents one side of the hyper-sphere. It is worth highlighting that this encoding alleviates any kind of post-processing of raw quaternion data during training or reproduction phases. Figure 6 -middle shows one of the demonstrated trajectories on real robot, and Fig. 6 -right displays the reconstructed geodesic using the decoder network and applied on a simulated robot arm. From the results, it is clear the motion generated by the geodesic leads to a motion pattern similar to the demonstrations. Note how the end-effector orientation evolves on the decoded geodesic in the ambient space, showing that the 90°rotation is properly encoded and reproduced. 

To evaluate our model on a more complicated scenario, we collected a dataset of pouring task demonstrations on a simulator. The task involves grasping 3 cans from 3 different positions and then pouring at 3 different cups placed at different locations. The demonstrated trajectories cross each other, therefore providing a multiple-solution setting. Figure 7 shows how the geodesic, depicted as a yellow curve, is constructed in the latent space. This geodesic starts from a point on the second demonstration group (blue dots) and switches to the first group (white dots) to get to the given target located in the third group (green dots). As a result, with 3 sets of demonstrations, all 9 permutations for grasping any can from the table and then pouring any cup are feasible.

To evaluate the method in the presence of an unseen obstacle, the start and target points of the geodesic path are selected such that it follows one of the demonstration groups. To ensure that the obstacle is in the way, we select its position from the training set. Figure 8 -left shows the geodesic performing obstacle avoidance while following the geometry of the manifold. The circular obstacle representation depicted as red and yellow circles in the latent space is just for the sake of visualization. The red and yellow curves represent geodesics avoiding the red and yellow obstacles, correspondingly. These curves correspond to one time frame of the adapted geodesics, showing how our method can deal with dynamic obstacles. The middle and right plots in Fig. 8 show the decoded geodesics executed on the simulated robot in the ambient space. The black trajectory shows the decoded geodesic, the dot clusters in red, green, and blue depict the demonstration sets, and the blue sphere depicts the obstacle. The middle plot shows the robot configuration 15 frames earlier than the plot at the right, displaying the obstacle dynamics during the task execution. Figure 9 -middle illustrates the graph computed from the corresponding data manifold in Fig. 9 -left. The graph-based geodesic (red curve in Fig. 9 middle) is then decoded and executed on the simulated robot (see Fig. 9 -right). Although gradient-based and graph-based geodesic computation are both viable options, the latter is faster and thus more suitable for real-time motion generation.

We have proposed a novel LfD approach that learns a Riemannian manifold from human demonstrations and computes geodesics to recover and synthesize learned motion skills. Our proposed geodesic motion generation is capable of planning movements from and to arbitrary points on the data manifold, while avoiding obstacles on the fly. We realize the idea with a variational autoencoder (VAE) over the space of position and orientations of the robot end-effector. Motion is generated with graph-based geodesic computation for real-time motion generation and adaptation. Through extensive evaluation in the simulation, we show geodesic motion generation performs well in different scenarios such as grasping and pouring.

The proposed methodology can be extended and improved in several directions. A consequence of learning a manifold skill using VAEs is that data lying outside the manifold may be arbitrarily misrepresented in the latent space Z. Consequently, any reconstruction in the ambient space X may be inaccurate. This may give rise to problems when conditioning on points, e.g. new targets, that are located outside the learned manifold. We did not explore this setting in the present paper. One possible solution may involve learning a bijective mapping between old demonstrations and new conditions, and then use this function to transform the learned manifold (e.g., by expanding or rotating) to fit another region of the space.

We introduce dynamical obstacle avoidance as a soft constraint through the ambient metric. If an approach based on hard constraints is to be preferred then one may opt to remove nodes near an obstacle from the graph instead of re-weighting edges. This could provide a computational saving, but one would lose the 'complete' Riemannian picture that we find elegant. It would also be straightforward to direct geodesics to go through select via-points by slight modifications to the graph algorithm. We did not explore these approaches here.

Our obstacle avoidance formulation only considered simple obstacles at this point, but the strategy can be extended to multiple dynamic obstacles. Instead of working with single Gaussian balls, one can imagine extending the approach to complex obstacle shapes represented as point clouds. This may increase the implementation demands in order to remain real time, but such an extension seems entirely reasonable. It is worth pointing out that to execute a motion safely the obstacle avoidance should be considered for all robot links and not just the end-effector. This requires a more informative manifold that is embedded in the joint space of the robot and an ambient space metric that combines the obstacle information in the Euclidean and joint space simultaneously.

Latent space oddity: on the curvature of deep generative models

Fast and robust shortest paths on manifolds learned from data

Geometrically enriched latent spaces

Neural dynamic policies for end-to-end sensorimotor learning

Magnification factors for the SOM and GTM algorithms

A tutorial on task-parameterized movement learning and retrieval

Unit quaternions: A mathematical tool for modeling, path planning and control of robot manipulators

Metrics for deep generative models

Fast approximate geodesics for deep generative models

Introduction to algorithms

Nonisometric manifold learning: Analysis and an algorithm

Expected path length on random manifolds

Sampling-based robot motion planning: A review

Learning multiple collaborative tasks with a mixture of interaction primitives

Only Bayes should learn a manifold

Scalable robust principal component analysis using Grassmann averages

Motion planning and reactive control on learnt skill manifolds

Perspectives on Euler angle singularities, gimbal lock, and the orthogonality of applied forces and applied moments

Kernelized movement primitives

Dynamical movement primitives: Learning attractor models for motor behaviors

Bayesian optimization meets Riemannian manifolds in robot learning

Variational autoencoders with Riemannian Brownian motion priors

Auto-encoding variational Bayes

Robot learning from demonstration by constructing skill trees

Introduction to Riemannian Manifolds

Learning task manifolds for constrained object manipulation

A survey of robotic motion planning in dynamic environments

An algorithmic perspective on imitation learning

Using probabilistic movement primitives in robotics

Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-performance deep learning library

CHOMP: Gradient optimization techniques for efficient motion planning

Recent Advances in Robot Learning from Demonstration. Annual Review of Control

Robot learning from demonstration of force-based tasks with multiple solution trajectories

Learning and sequencing of object-centric manipulation skills for industrial tasks

Trajectory Optimisation in Learned Multimodal Dynamical Systems Via Latent-ODE Collocation

Dynamic Movement Primitives -A Framework for Motor Control in Humans and Humanoid Robotics

Conditional neural movement primitives

The Riemannian geometry of deep generative models

panda simulator: Gazebo simulator for Franka Emika Panda robot supporting sim-to-real code transfer

Saif Sidhik. justagist/franka ros interface: Franka ROS Interface version for franka-ros v0.7.1-release

Directional statistics in machine learning: A brief review