key: cord-0230125-i8lwo1k9
authors: Ma, Wanyu; Zhang, Bin; Han, Lijun; Huo, Shengzeng; Wang, Hesheng; Navarro-Alarcon, David
title: Action Planning for Packing Long Linear Elastic Objects into Compact Boxes with Bimanual Robotic Manipulation
date: 2021-10-22
journal: nan
DOI: nan
sha: 90c28a2dc403695556328c97fef81efb374a66ef
doc_id: 230125
cord_uid: i8lwo1k9

Automatic packing of objects is a critical component for efficient shipping in the Industry 4.0 era. Although robots have shown great success in pick-and-place operations with rigid products, the autonomous shaping and packing of elastic materials into compact boxes remains one of the most challenging problems in robotics; The automation of packing tasks is crucial at this moment given the accelerating shift towards e-commerce (which requires to manipulate multiple types of materials). In this paper, we propose a new action planning approach to automatically pack long linear elastic objects into common-size boxes with a bimanual robotic system. For that, we developed an efficient vision-based method to compute the objects' geometry and track its deformation in real-time and without special markers; The algorithm filters and orders the feedback point cloud that is captured by a depth sensor. A reference object model is introduced to plan the manipulation targets and to complete occluded parts of the object. Action primitives are used to construct high-level behaviors, which enable the execution of all packing steps. To validate the proposed theory, we conduct a detailed experimental study with multiple types and lengths of objects and packing boxes. The proposed methodology is original and its demonstrated manipulation capabilities have not (to the best of the authors knowledge) been previously reported in the literature.

of efficient vision-based strategies for robots to automatically manipulate and pack elastic objects into compact boxes.

To advance in the development of these types of manipulation skills, we consider the challenging task where a (long) linear elastic object needs to be autonomously grasped, shaped/deformed, and placed within a compact box that optimizes packing space, as depicted in Fig. 1 . There are two main challenges that arise with the automation of this task: (i) Due to the complexity of the shaping task and the object's intrinsic elasticity, dual-arm collaboration is required to effectively bend and place the object within the box (this operation may need several precisely coordinated actions by the arms); (ii) The typically occluded view from a vision sensor leads to partial observations of the manipulated object and packing environment (this results in incomplete geometric information that complicates the real-time guidance of the task).

Researchers have conducted various studies and built theories for supply chains [2] [3] [4] , such as the pre-shipment package preparation in the order fulfillment systems [5] , the knowledge-based logistics operations planning system to maintain the quality of the service for small-quantity customer orders [6] , and so on. Although there has been significant progress in the study of packing rigid objects [7] , [8] , packing elastic linear objects into compact containers has received little attention. For example, Amazon picking challenge is a famous competition focusing on automatic picking and placing with robotic manipulators [9] and succeed in terms of various real scenarios [10] , [11] , most of the works neglected the object's deformation. For robotic manipulation, the components of a behavior can be various granularities from low-level motor torques to high-level action primitives. Assumed to be accurately executed without failure [12] , action primitives are well used in robotic manipulations highly requiring intelligence and robustness, such as grasp control [13] , humanoid soccer [14] , human-robot collaborative [15] , [16] , and so on. Therefore, action planning with action primitives is a good choice based on fine motor control.

The modeling method of elastic objects is another key problem in our work considering the challenges on sensing, planning, and control to real tasks [17] Traditionally, modelbased methods (e.g. mass-spring-damping model [18] , [19] and finite element method [20] , [21] ) are employed to depict the elastic objects, which require the accurate physical parameters of objects, which are always unavailable. Recently, the community has seen the rising of vision-based shape servoing [22] [23] [24] , where visually perceived shapes serve as feedback for the controller to deform the soft object. Among those algorithms, point cloud has shown great advantages over 2D images [25] by offering much more information. Thanks to the development of depth cameras and their relevant processing algorithms, point cloud is also easy to obtain. Many researchers tried to construct 3D surface models [26] and topology geometric meshes [27] of soft objects. However, these algorithms demand vast computation resources and sometimes offer redundant information when deformations only cause slight changes of topological meshes. In this paper, we simplify the shape perception to point cloud extraction for geometric reconstruction without requiring manual designed markers. Meanwhile, to deal with the heavy occlusions during manipulation, we design a strategy to balance the online perception and offline model, where we replace the insidethe-box part of the object with a suitable defined shape and implement planning only to the outside part of the object.

The original contributions of this paper are as follows: 1) A complete methodology to automatically pack long linear elastic objects into boxes with bimanual robot manipulators. 2) An action planner for soft packing tasks based on action primitives and a target planning algorithm to command the system with high-level behaviors. 3) A hybrid geometric model of the object constructed with online 3D perception and an offline reference model to deal with occlusions. 4) A detail experimental study with bimanual robotic manipulators to validate the proposed theory. The remainder of this paper is organized as follows. Sec. II presents the proposed methodology. Sec. III reports the conducted experiments. Sec. IV concludes the paper and gives future directions.

High-level behaviors are intuitive for humans, while it is necessary to disassemble and translate behaviors for robots to achieve similar intelligence. Hence, in our approach, action primitives are designed and an action planner is modeled to compose and generate actions for the packing task. There is an inherent conflict in this task, that the robot needs a specific action to fix the object against its elasticity while a heavy occlusion exists all the time under this situation. A hybrid Fig. 2 . The framework of the proposed approach for packing long linear elastic objects into common-size boxes. model combining online 3D perception and offline reference model is proposed to tackle this problem. The corresponding reference target of the action planner is comprehensively explained based on the proposed hybrid model. In this section, we present the complete pipeline of the proposed method, and emphasize two important parts, e.g.: (i) the hybrid model under obstructions during the manipulation and (ii) the action planner based on action primitives and target planning. The whole framework of the proposed method is shown in Fig. 2 .

As the robot conducts the manipulation, the view of the depth camera will be inevitably obstructed. Therefore, the camera cannot perceive the entire object at most times, which compromises the effectiveness of the feedback point cloud. To handle this problem, we propose a hybrid geometric model of the object composed by an online extraction of the geometric information and an offline reference object model.

The reference object model of the elastic object is a prerequisite to plan the packing manipulation, which estimates the target shape of object and replaces the obstructed visual feedback. To maximize the space utilization of the box, we design a target shape of Helix composed of straight segments and two groups of concentric semicircles as shown in Fig. 3 . Since only planar manipulations are concerned (i.e., the motion of the object is of 4 DOFs, three for translation, one for rotation), the object's maximum allowed folding times is N F = ⌊ w 2R ⌋, where ⌊·⌋ denotes the rounded down nearest integer operator, and the maximum allowed length is:

where l and w are the length and width of the box, respec-Algorithm 1: The description of the shape Helix Input: the box (l, w, h), the object (L, R) Output: the analytical fomula P M (u)

if uL − lcount < l circle then 10 φ = uL−lcount r ; 11 P M (u) = P 1 − (r sin φ, r cos φ), j is odd; 12 P M (u) = P 2 + (r sin φ, r cos φ), j is even; tively. Therefore, L H defines the maximum capacity of the given box.

For the target shape of a given elastic object (L, R), we parameterize its center line (see Fig. 3 (b)) with a bounded parameter u ∈ [0, 1] and denote the parameterized center line as P M (u) which is defined as the reference object model. The length of P M (u) is computed as: (2) where δ = L/N M is the step size of computation. Clearly, we have boundary conditions of L(1) = L and L(0) = 0. The process for generating the target shape Helix is presented in Algorithm 1. 

For the 4-DOF motion, Z M is set as always vertically pointing to the ground and Y M is generated by the right-hand principle.

In the perception stage, we employ point cloud to extract the shape information of the object. We assume that the objects in this study are uniform elastic rods and the color of elastic objects in our experiments has a high contrast to the background. Therefore we can easily extract the region of interest from the RGB image then obtain the corresponding depth information from the aligned depth image. Combining the RGB and depth information, we generate the point cloud of the surface of the object and denote it as P OS . Note that during the packing process, the inside-the-box part of P OS is removed and the outside part of P OS is reserved for the following operation.

To utilize the point cloud in the following work, we need to process it for preparation (as shown in Fig. 4 ). Firstly, we smooth P OS with a weighted filter and downsample it to accelerate the updating and computation. Next, we extract the boundaries of the object from the point cloud. For that, we introduce a polar coordinate system with the origin at O B and axis along X B (as defined in Fig. 1 ), and counterclockwise segment the object into N O sections with half-lines originating from O B with a uniform angle interval (except two ends). Next, for each half-line, we find its neighboring points in the point cloud, and label the nearest one p in i (i = 1, ..., N O ) and the farthest one p out i (i = 1, ..., N O ) to O B as components of the inner boundary and outer boundary of the object, respectively. Lastly, we calculate the mean of the raw feedback points between two adjacent half-lines, denoted as

The sorted skeleton P O provides the object's geometric information and facilitates the target planning. The length L and radius R of the cylindrical object are calculated as:

and the given object can be represented by (L, R).

We define the object body frame as 

Unlike human beings, it is usually difficult for robots to understand complicated manipulations. Therefore, we disassemble the manipulations into several action primitives to guide the robot to accomplish the packing manipulation task.

Before giving the definitions of the action primitives, we need to clarify some important concepts for preparation. Firstly, we denote the main robotic arm to execute the actions as r = {Lef t, Right}, where Lef t and Right represent Algorithm 2: Corresponding position p O (p M n ) at the object with respect to the reference object model

the left robotic arm and the right robotic arm to the axis Y B , respectively. The assistant robotic arm is denoted as −r.

Secondly, we denote the 4-DOF reference pose as X R = [x r , y r , z r , θ r ] ⊤ and the target pose as

Note that the target pose for different actions may be generated from P M (e.g., put objects into box), P O (e.g., grasp the object), or the feedback pose of a robot X E = [x e , y e , z e , θ e ] ⊤ (e.g., change the height). Finally, before implementing every manipulation loop, each of the robotic arms should move to a suitable pose

Now we give the definitions of the behaviors and action primitives. First, we categorize the behaviors of the gripper into three types:

where g 1 , g 2 , and g 3 are the corresponding gripper ratios ranging from 0 to 1. Next, we define the following five active primitives:

The definitions of the active primitives are as follows: a 1 Hover: r moves towards the target according to a reference pose X R from {F O } or {F M }. This action has two modes. The first mode is activated when z r ≤ z e . The robot moves at a constant height until it reaches the target position. Another mode is activated when z r > z e . This usually happens when part of the object is inside the box. Under the second mode, the robot will firstly move to the nearest point of among P O , then move along the object at a constant height ∆h towards the target. The reference pose(s) for the action Hover is defined as follows:

where p(a 1 ) is a path:

a 2 Approach: r descends to the target height after the Hover action. The target pose of this action is X T (a 2 ) = [x e , y e , z r + δ 1 , θ e ] ⊤ , where δ 1 is a Small bias selected according to the robot's relative height to the object or the model. Combining Hover and Approach, we can compose the grasp-the-object or the put-object-into box manipulations. 

To pack a long object into a box, the robot may need multiple grasping manipulations. Therefore, it is necessary for us to plan reference targets for robot's grasping and regrasping. In this work, we plan two types of target points, i.e. grasp points and fix points, with which, the poses are generated by the frame

1) Grasp Points Planning: In this part, we present the algorithm for generating suitable grasp points. We denote indexes of the planned grasp points as n G = {n i , i = 1, · · · , N G }, where N G is the number of grasping manipulations required to pack the entire object in the box. The points from the reference object model P MG = {p M ni , n i ∈ n G } represent the positions to put into box. The points from the object P OG = {p O (p M ni ), n i ∈ n G } represent the positions to grasp the object. Note that if n i > 0, the grasp points start from the beginning of P M and trace forward, else if n i < 0, the grasp points start from the end of P M and trace backward. The computation process is shown in Algorithm 2.

The choices of n G can be various. Here we present a specific case as an example (see Fig. 5 ). Firstly, we extract two trapezoid areas α 1 and α 2 (pink areas in Fig. 5(a) ) in the box bounded by a diagonal line and the axis Y B . Then, we categorize P M into periodic part and non-periodic part, where periodic part consists of semicircles and straight segments, and non-periodic part consists of the beginning segment and the end segment of the object. Next, for the periodic part and the beginning segment, we define the points in two trapezoid areas that have a distance of δ 5 to Y B as the grasp points (see Fig.  5 (a)), that is,

Note that, the reference object model starts from a corner of the box and the point index clockwise increases. Lastly, we plan the grasp points for the end segment. Since the length of the object is random, the end segment is typically not a complete periodic part and the target shape may end at a semicircle or straight segment. The grasp points should be carefully selected to ensure the successful packing of the entire object. Therefore, we design the final grasp point as n N G = −δ 4 , where δ 4 > 0 denotes the δ 4 -th point from the end of the object (see Fig. 5(b) ).

Now, based on the grasp points, we plan the fix points for F ix action. During the packing manipulation, because of the elasticity, the object presses the grippers and hinders their leaving. Therefore, in our study, a behavior assist-to-change-hand is designed to deal with this problem. Given the following sequence of the main robotic arm:

the corresponding assistant robot is naturally determined. Since a robot can work in the continuous two loops as the main robot, both robots are possibly going to help to fix the object. The fix points for the main robot and the assistant robot in the current loop are denoted as p MF (r) and p MF (−r), respectively. For the main robot, the fix point p MF (r) is the nearest point away from the main robot on P M . For the assistant robot, the positions of the fix points are designed to keep a minimum distance δ 3 away from p MF (r) (see Fig. 5(c) ). δ 3 should satisfy δ 3 < δ 4 · δ (δ is the step size of shape model) to guarantee the first and final F ix action locating on the object. Then we can obtain two satisfied points with increasing index and decreasing index to p MF (r). Therefore, −r is required to detect the relative direction with respect to the main robot and to select the point which is at the same side with it as the fix point. The whole algorithm is presented in Algorithm 3.

The given sequence of the main robotic arm r and the wellplanned grasp points and the fix points are passed to the action planner to generate the robot's action sequence to pack the object. In this work, we employ a state machine (see Fig. 6 ) as the action planner. The state machine consists of two levels. The first level is the high-level behaviors of the packing manipulation, i.e., grasp the object, put object into box, assist to change hand and prepare to grasp/finish the task. Since we have obtained the indexes of the grasp points in Sec. II-C, we can generate the reference poses of grasp-the-object and put-object-intobox from P OG and P MG , respectively. The reference poses of assist-to-change-hand are determined by Algorithm 3. We assume that once the robot starts moving, it can accomplish the whole grasp loop following the designed procedures. During the packing process, we denote the remaining unpacked length of the object as l rest . If l rest > 0 after a grasp loop, the robots conduct prepare-to-grasp behavior and the state machine enters the next loop. If l rest = 0, the robots return to the initial poses and the state machine conducts the finishthe-task behavior. The state of the second level of the state machine is the movement of the robot, i.e., m(r, g, a), where r is the main robotic arm, g is the gripper behavior, and a is the selected action primitive. The result of the movement is represented as a bunch of piece-wise functions: m(r, g, a) = 1, if the action is successfully achieved, 0, otherwise. (12) When the current state reaches m(·) = 1, the state machine transits to the next state.

To validate the proposed approach, we conduct several experiments regarding objects with different lengths. The setup of the experiments is shown in Fig. 1 . Two 6-DOF UR3 robots compose a dual-arm manipulator to conduct the manipulations. Each UR3 robot equips with a Robotiq 2F-85 two-finger gripper. To improve the flexibility of the grippers, we design some extending parts and install them on the grippers to extend the lengths. An Intel RealSense L515 LiDAR camera is mounted on the top of the operation space for 3D perception. We place a table (serving as the manipulation plane) between the UR3s and fix the box on it. Before the packing manipulations, the camera, box (table), and robots are calibrated. We program the project in Python and C++ and run it on Ubuntu 16.04. The communication is constructed via ROS and data are visualized by RViz.

In the conducted experiments 1 , we select four objects with different lengths (558 mm, 600 mm, 830 mm, 972 mm) and pack them into boxes of two sizes (270×207×80 mm 3 , 314×232×80 mm 3 ). Note that for case 3 (830 mm) and case 4 (972 mm), we put an additional board on the side of the table to limit the object's motion and keep it within the camera's view range. The generated point cloud of the reference object model and packing results are presented in Fig. 7 . Thanks to the distributed communication of ROS and the down-sampling of the point cloud, the time cost of every update of the program is less than 30 ms. From the packing results, we can see that our approach is capable to pack long linear elastic objects into boxes of different sizes.

To verify if the designed shape Helix is able to estimate the object in the box, we need to measure the similarity of the target shape and the packed object. Since we have obtained the point cloud P OS of the packed object and represented the target shape Helix by a reference object model P M , we measure the similarity by calculating the Euclidean distances:

Note that P OS is extracted from the surface of the object. Therefore, if the shape of the packed object matches the target shape well, the average of d P C should be R (the radius of cross sections of the object) which is 18 mm in our experiments. Fig. 8 presents the raw feedback point clouds and the reference object models. By employing Eq. (13), we calculate 1 https://youtu.be/dU8l6eBJpfs the similarities in different cases, as shown in Fig. 9 . From the data, we can see that the mean distances are close to 18 mm and the variances are neglectable. Therefore, the shape of the packed object matches the target shape well in each case.

The proposed hybrid geometric model dealing with the frequent occlusion does not need to perceive the object all the time thanks to the offline reference object model. Since we have planned targets for manipulation beforehand, the sensing system only needs to work at some key moments, such as the beginning of every grasp loop.

We take case 4 as an example to explain the process to extract the object's geometric information (as shown in Fig.  10 ). In case 4, the robots accomplish three grasp loops to pack the entire object into the box, therefore the object's geometry is measured three times in total at the beginnings of the three grasp loops. Before the first action (see Fig. 10(b-1) ), the length and radius of the object are initialized with the values listed in Fig. 9 . At following actions, the feedback points inside the box are removed, and the rest of the points outside the box are reordered as a new P OS and used to compute the length of the outside-the-box part of the object. Based on the new visual measurement, grasp points and fix points are updated. The process will repeat until the entire object is packed into the box, i.e., with no feedback points outside the box.

In this section, we also take case 4 as an example. For case 4, we planned three grasp loops to pack the object. The planned indexes of grasp points are computed as n G = {−94, −56, −10}. As mentioned in Sec. II-C, we assign the roles of two UR3s for each grasp points, and the sequence of main robot conducting the grasp is empirically determined as r = {Lef t, Lef t, Right} = {r 1 , r 2 , r 3 }. Fig. 10 presents the grasp points (orange points) and the corresponding fix points (light-blue points). The main robot grasps the object at the orange point and moves it to the corresponding orange point on the reference object model. After that, the assistant robot fixes the object by pushing the object at the corresponding light-blue point on the reference object model, and the main robot is released to conduct the next action. Mostly, it is sufficient for the robots to move at a constant height to avoid collision with the box since the box is a cuboid. But the part of the object outside the box is higher than the box during the manipulation. It increases the possibility of collision of robots and the object causing the failure of grasping (as shown in Fig. 12(a-1)-(a-3) . Fig. 11 shows the second Hover mode. When the grasp point is higher than the robot gripper, the robot firstly moves the gripper to the nearest point above the object, then moves along the curvature of the object to avoid the collision with the object and the box until the gripper reaches the target grasp point. This method guarantees the success of grasping (as shown in Fig. 12(b-1)-(b-3) ).

As mentioned in Sec. III-A, we conduct four experiments with objects of different lengths. In this section, we take case 4 as a representative example to demonstrate the working process of the action planner. Fig. 13 presents the whole process of the packing manipulation, where each thumbnail showcases a movement of the robot (i.e., a state of the state machine). The second grasp loop is almost the same as the first grasp loop, except that the assist-to-change-hand behavior involves changing the main robot from Lef t (the second loop) to Right (the third loop) as planned in Sec. III-D, while the main robots are the same in the first loop and the second loop (i.e., there is no changing hand). Since the third grasp loop finishes the packing process, the last behavior is to finish-thetask, while the last behaviors of the first and second grasp loop are prepare-to-grasp.

In this work, we propose a complete method to pack the long linear elastic object into a compact box. Firstly, we design a real-time perception method to extract the physical information and track the deformations of the object. Secondly, we define a target shape Helix for reference planning. The similarity between the target shape and the shape of the packed object is examined in the experimental analyses. Then, a hybrid model is defined considering occlusions. Next, we propose an action planner to compose some defined action primitives as high-level behaviors and fulfill the packing of the object. Finally, extensive experiments are conducted to verify our proposed method.

Although the method is designed for packing tasks, the defined action primitives and the target planning method can be used in other manipulation tasks (e.g. object sorting, multiple objects assemblies, etc). Also, note that the proposed perception method is able to work without markers and decrease computation time by extracting minimum physical information of the object, which brings generality to the proposed method.

There are several limitations of our method, e.g., our perception method does not consider the situations where the object is outside the camera's view range. The obstruction from the robot is still not solved. A possible solution is to employ multi-view vision to perceive the object. Besides, the maximum capacity of the box is usually not reachable because of the elasticity of the object. Therefore, there is a trade-off between the length of the objects and the size of the boxes. m(r 1 ,g 1 ,a 1 ) m(r 1 ,g 1 ,a 2 ) m(r 1 ,g 2 ,a 4 ) m(r 1 ,g 2 ,a 1 ) m(r 1 ,g 2 ,a 2 ) m(−r 1 ,g 3 ,a 1 ) m(−r 1 ,g 3 ,a 3 ) m(r 1 ,g 1 ,a 4 ) m(r 1 ,g 1 ,a 5 ) m(r 2 ,g 1 ,a 1 ) m(r 2 ,g 1 ,a 2 ) m(r 2 ,g 2 ,a 4 ) m(r 2 ,g 2 ,a 1 ) m(r 2 ,g 2 ,a 2 ) m(−r 2 ,g 3 ,a 1 ) m(−r 2 ,g 3 ,a 3 ) m(r 2 ,g 1 ,a 4 ) m(r 2 ,g 1 ,a 5 ) m(r 2 ,g 1 ,a 4 ) m(r 2 ,g 3 ,a 3 ) m(r 3 ,g 1 ,a 1 ) m(r 3 ,g 1 ,a 2 ) m(r 3 ,g 2 ,a 4 ) m(r 3 ,g 2 ,a 1 ) m(r 3 ,g 2 ,a 2 ) m(−r 3 ,g 3 ,a 1 ) m(−r 3 ,g 3 ,a 3 ) m(−r 3 ,g 1 ,a 5 ) m(r 3 ,g 1 ,a 4 ) m(r 3 ,g 3 ,a 3 ) m(−r 3 ,g 3 ,a 4 ) m(r 3 ,g 3 ,a 4 ) m(r 3 ,g 1 ,a 5 )

( Fig. 13 . Experiment process and action primitives of the case 4. The rows represent three manipulation loops. The columns represent the behaviors of the robots. The thumbnails demonstrate the conducted robotic movements. The first two loops are mainly executed by the left arm (grasp and put) and assistant by the right arm (fix). The third loop is mainly execute by the right arm (grasp and put) and assistant by the left arm (fix).

For future work, we plan to explore the multi-view vision and to extend the framework to other comprehensive types of objects (e.g., rigid, elastic, articulated), as well as to optimize the packing to save space. Our team is currently working along this challenging direction.

The role of packaging in omni-channel fashion retail supply chains-how can packaging contribute to logistics efficiency?

Supply chain transformation

E-fulfillment: the strategy and operational requirements

Packaging and logistics interactions in retail supply chains

Managing shipment release from a storage area to a packing station in a materials handling facility

A knowledge-based logistics operations planning system for mitigating risk in warehouse order fulfillment

Hybrid heuristic algorithm based on improved rules & reinforcement learning for 2d strip packing problem

Jampacker: An efficient and reliable robotic bin packing system for cuboid objects

A summary of team mit's approach to the amazon picking challenge

Nimbro picking: Versatile part handling for warehouse automation

Packing planning and execution considering arrangement rules

Interactive perception: Leveraging action in perception and perception in action

Robust sensor-based grasp primitive for a three-finger robot hand

Hierarchical and state-based architectures for robot behavior planning and control

Robot action planning by online optimization in human-robot collaborative tasks

On the manipulation of articulated objects in human-robot cooperation scenarios

A primal-dual active set method for solving multi-rigid-body dynamic contact problems

Constructing rheologically deformable virtual objects

Soft material modeling for robotic manipulation

Using physical modeling and rgb-d registration for contact force sensing on deformable objects

Flexible simulation of deformable models using discontinuous galerkin fem

Visual servo control, part i: Basic approaches

Visual servo control. ii. advanced approaches

Fourier-based shape servoing: a new feedback method to actively deform soft objects into desired 2-d image contours

Estimating the deformability of elastic materials using optical flow and position-based dynamics

3-d deformable object manipulation using deep neural networks

Combinatorial manifold mesh reconstruction and optimization from unorganized points with arbitrary topology