key: cord-0464902-cxcfh869 authors: Bauer, Stefan; Widmaier, Felix; Wuthrich, Manuel; Funk, Niklas; Jesus, Julen Urain De; Peters, Jan; Watson, Joe; Chen, Claire; Srinivasan, Krishnan; Zhang, Junwu; Zhang, Jeffrey; Walter, Matthew R.; Madan, Rishabh; Schaff, Charles; Maeda, Takahiro; Yoneda, Takuma; Yarats, Denis; Allshire, Arthur; Gordon, Ethan K.; Bhattacharjee, Tapomayukh; Srinivasa, Siddhartha S.; Garg, Animesh; Buchholz, Annika; Stark, Sebastian; Steinbrenner, Thomas; Akpo, Joel; Joshi, Shruti; Agrawal, Vaibhav; Scholkopf, Bernhard title: A Robot Cluster for Reproducible Research in Dexterous Manipulation date: 2021-09-22 journal: nan DOI: nan sha: a74060d7afa611feb5efce500a1fc1920b80e058 doc_id: 464902 cord_uid: cxcfh869 Dexterous manipulation remains an open problem in robotics. To coordinate efforts of the research community towards tackling this problem, we propose a shared benchmark. We designed and built robotic platforms that are hosted at the MPI-IS and can be accessed remotely. Each platform consists of three robotic fingers that are capable of dexterous object manipulation. Users are able to control the platforms remotely by submitting code that is executed automatically, akin to a computational cluster. Using this setup, i) we host robotics competitions, where teams from anywhere in the world access our platforms to tackle challenging tasks, ii) we publish the datasets collected during these competitions (consisting of hundreds of robot hours), and iii) we give researchers access to these platforms for their own projects. Dexterous manipulation is humans' interface to the physical world. Our ability to manipulate objects around us in a creative and precise manner is one of the most apparent distinctions between human and animal intelligence. The impact robots with a similar level of dexterity would have on our society cannot be overstated. They would likely replace humans in most tasks that are primarily physical, such as working at production lines, packaging, constructing houses, agriculture, cooking, and cleaning. Yet, robotic manipulation is still far from the level of dexterity attained by humans, as witnessed by the fact that these are still mostly carried out by humans. This problem has been remarkably resistant to the rapid progress of machine learning over the past years. A factor that has been crucial for progress in machine learning, but nonexistent in real-world robotic manipulation, is a shared benchmark. Benchmarks allow for different labs to coordinate efforts, reproduce results and measure progress. Most notably, in the area of image processing, such benchmarks were crucial for the rapid progress of deep learning. More recently, simulation benchmarks have been proposed in reinforcement learning (RL) [8] , [25] . However, methods that are successful in simulators transfer only to a limited degree to real robots. Therefore, the robotics community has recently proposed a number of open-source platforms for robotic manipulation [28] , [1] , [27] . These robots can be built by any lab to reproduce results of other labs. While this is a large step towards a shared benchmark, it requires effort by the researchers to set up and maintain the system, and it is nontrivial to ensure a fully standardized setup. Therefore, we provide remote access to dexterous manipulation platforms hosted at MPI-IS, see Figure 1 (interested researchers can contact us to request access). This allows for an objective evaluation of robot-learning algorithms on real-world platforms with minimal effort for the researchers. In addition, we publish a large dataset of these platforms interacting with objects, which we collected during a recent competition we hosted. During this competition, teams from across the world developed algorithms for challenging object manipulations tasks, which yielded a very diverse dataset containing meaningful interactions. Finally, we also provide a simulation of the robotic setup, allowing for research into sim-to-real transfer. All the code is open-source. In the rest of the paper, we describe the robotic hardware and the software interface which allows easy robot access, similarly to a computational cluster. We also describe the robot competition we hosted and the data we collected in the process. In the past years, a large part of the RL community has focused on simulation benchmarks, such as the deepmind control suite [25] or OpenAI gym [8] and extensions thereof [30] . These benchmarks internally use physics simulators, typically Mujoco [26] or PyBullet [12] . These commonly accepted benchmarks allowed researchers from different labs to compare their methods, reproduce results, and hence build on each other's work. Very impressive results have been obtained through this coordinated effort [17] , [14] , [24] , [20] , [18] , [13] , [19] . In contrast, no such coordinated effort has been possible on real robotic systems, since there is no shared benchmark. This lack of standardized real-world benchmarks has been recognized by the robotics and RL community [6] , [7] , [9] , [10] , [4] , [21] . Recently, there have been renewed efforts to alleviate this problem: a) Affordable Open-Source Platforms:: The robotics community recently proposed affordable open-source robotic platforms that can be built by users. For instance, [28] propose Replab, a simple, low-cost manipulation platform that is suitable for benchmarking RL algorithms. Similarly, [1] propose a simple robotic hand and quadruped that can be built from commercially available modules. CMU designed LoCoBot, a low-cost open-source platform for mobile manipulation. [16] propose an open-source quadruped consisting of off-the-shelf parts and 3D-printed shells. Based on this design, [27] developed an open-source manipulation platform consisting of three fingers capable of complex dexterous manipulation (here, we use an industrial-grade adaptation of this design). Such platforms are beneficial for collaboration and reproducibility across labs. However, setting up and maintaining such platforms often requires hardware experts and is timeintensive. Furthermore, there are necessarily small variations across labs that may harm reproducibility. To overcome these limitations, the robotics community has proposed a number of remote robotics benchmarks. b) Remote Benchmarks:: For mobile robotics, [23] propose the Robotarium, a remotely accessible swarm robotics research platform. Similarly, Duckietown [22] hosts the AI Driving Olympics [2] twice per year. However, a remote benchmark for robotic manipulation accessible to researchers around the world is still missing. Therefore we propose such a system herein. We host 8 robotic platforms at MPI-IS (see Figure 1 ), remote users can submit code which is then assigned to a platform and executed automatically, akin to a computational cluster. Users have access to the data collected during execution of their code. Submission and data retrieval can be automated to allow for RL methods that alternate between policy evaluation and policy improvement. The platforms we use here are based on an open-source design that was published recently [27] . The benefits of this design are • Dexterity: The robot design consists of three fingers and has the mechanical and sensorial capabilities necessary for complex object manipulation beyond grasping. • Safe Unsupervised Operation: The combination of robust hardware and safety checks in the software allows users to run even unpredictable algorithms without supervision. This enables, for instance, training of deep neural networks directly on the real robot. • Ease of Use: The C++ and Python interfaces are simple and well-suited for RL as well as optimal control at rates up to 1 kHz. For convenience, we also provide a simulation (PyBullet) environment of the robot. Here, we use an industrial-grade adaptation of this hardware (see Figure 1 ) to guarantee an even longer lifetime and higher reproducibility. In addition, we developed a submission system to allow researchers from anywhere in the world to submit code with ease. Actions: This platform consists of 3 fingers, each with 3 joints, yielding a total of 9 degrees of freedom (and 9 corresponding motors). There are two ways of controlling the robot: One can send 9-dimensional torque-actions which are directly executed by the motors. Alternatively, we provide the option of using position-actions (9-dimensional as well), which are then translated to torques by an internal controller. The native control rate of the system is 1kHz, but one can control at a lower rate, if so desired. Observations: An observation consists of proprioceptive measurements, images and the object pose inferred using an object tracker. The proprioceptive measurements are provided at a rate of 1 kHz and contain the joint angles, joint velocities, joint torques (each 9 dimensional) and finger tip forces (3 dimensional). There are three cameras placed around the robot. The camera images are provided at 10 Hz and have a resolution of 270x270 pixels. In addition, our system also contains an object tracker, which provides the pose of the object being manipulated along with the images. We use HTCondor 1 to provide a cluster-like system where users can submit jobs which are then automatically executed on a randomly-selected robot. During execution, a backend process automatically starts the robot, monitors execution and records all actions and observations. The users only use the simple frontend interface to send actions to and retrieve observations from the robot (see [27] for more details). Below is a minimal example of actual user code in Python. It creates a frontend-interface to the robot and uses it to send torque commands that are computed by some control policy based on the observations: 1 # Initialise front end to interact with the robot. 2 robot = robot_fingers.TriFingerPlatformFrontend() 3 4 # Create a zero-torque action to start with 5 action = robot_interfaces.trifinger.Action() At the end of each job, the recorded data is stored and provided to the user, who can then analyse it and use it, for example, to train a better policy. Users can automate submissions and data retrieval to run RL algorithms directly on the robots. Using the robotic platforms described in Section III, we organized the "Real Robot Challenge 2020", a competition which ran from August to December 2020. The participants were able to access the robots remotely via the submission system described in Section III-B. Thus teams from all around the world were able to participate, allowing them to work with real robots, even if they do not have access to one locally (or were locked out of their labs due to the Covid pandemic). Their algorithms generated a large dataset rich in contact interactions, which we make publicly available. Currently, we are hosting another edition of this challenge. The challenge was split into three phases: • Phase 1: In the first phase, the participants had to manipulate a 65 mm cube in simulation. There was no restriction on who could participate, this phase served as a qualification round before giving the participants access to the real robots. • Phase 2: The teams that achieved promising results in the first phase were given access to the real robots where the same task had to be solved. • Phase 3: For the last phase, the cube was replaced by a smaller, elongated cuboid (20x20x80 mm) that is more difficult to grasp and manipulate. Figure 2 shows pictures of the different phases and the objects that were used. In all three phases, the task was to move the object from its initial position at the center of the workspace to a randomly sampled goal. In each phase, there were four levels of difficulty corresponding to different goal distributions: • Level 1: The goal is randomly sampled on the table. The orientation is not considered. For this level it is not necessary to lift the object, so it can be solved by pushing. • Level 2: The object has to be lifted to a fixed goal position 8cm above the table center. The orientation is not considered. • Level 4: The goal is randomly sampled somewhere within the arena with an height of up to 10 cm. The orientation is not considered. • Level 5: As level 3, but in addition to the position, a goal orientation is sampled uniformly. In phase 1, we used an episode length of 15 s (3750 steps at 250 Hz). In phases 2 and 3, this was increased to 2 min (120000 steps at 1 kHz) so that the users had enough time on the real robots. We defined the performance of an episode as the cumulative reward, i.e. the sum rewards across all time-steps R = t r t . In the following, we describe the reward functions (for a single time step) used in the different levels. 1) Reward for Levels 1-3: For difficulty levels 1-3, we only considered position error (i.e. orientation is ignored). We used a weighted sum of the Euclidean distance on the x/y-plane and the absolute distance along the z-axis. Both components are scaled based on their expected range. Since the z-range is smaller, this means that the height has a higher weight. The sum is again rescaled so that the total error is in the interval [0, 1]. Given goal position p g = (x g , y g , z g ), actual position p a = (x a , y a , z a ), arena diameter d and maximum expected height h, the position error e pos is computed as The arena diameter is d = 0.39, matching the inner diameter of the arena boundary, and the maximum height is h = 0.1. We compute the reward r by simply negating the error r = −e pos . 2) Reward for Level 4: a) Phases 1 and 2:: For level 4, we considered both position and orientation. We compute the position error e pos as in the previous level, according to (1) . For the orientation, we first compute the rotation q = (q x , q y , q z , q w ) (given as quaternion) that would have to be applied to the actual orientation to obtain the goal orientation. We then compute the angle of this rotation, divided by π to normalize to the interval [0, 1]: For the total reward r we sum the two errors, rescale again to the interval [0, 1] and negate so that a higher reward means lower error r = − e pos + e pos 2 . (3) b) Phase 3:: We found that for the narrow cuboid used in phase 3, our object tracking method was unreliable with respect to the rotation around the long axis of the cuboid. To prevent this from affecting the reward computation, we changed the computation of the rotation error to use only the absolute angle between the long axes of the cuboid in goal and actual pose. This is again divided by π for normalisation. The reward is then computed in the same way as above, using (3). Since there is randomness in each execution due to the randomly-sampled goal and small changes in initial configuration of the object and robot, we need to average a number of episodes to approximate the expected reward. At the end of each real-world phase, the code of each team was executed for multiple goals of each difficulty level (we use the same goals across all users to reduce variance). The average cumulative rewards R i for the different levels i are then aggregated by 4 i=1 i · R i . This weighted sum gives a higher weight to higher (more difficult) levels, encouraging teams to solve them. After the simulation stage, seven teams with excellent performance qualified for the real-robot phases. These teams made thousands of submissions to the robots, corresponding to approximately 250 hours of robot-run-time (see Figure 3 for submission statistics). Tables I and II show the final evaluation results of phases 2 and 3 (the scores in these tables are computed as described in Section IV-B). The top teams in both phases found solutions that successfully grasp the object and move it to the goal position. Videos published by the winning teams are available on YouTube 2 . They also published reports describing their methods [29] , [5] , [11] and open-sourced their code 3 . We collected all the data produced during the challenge and aggregated it into a dataset, which is described in Section IV-D. 1) The Winning Policies: The winning teams used similar methods for solving the task: They made use of motion primitives that are sequenced using state machines. For difficulty level 4, where orientation is important, they typically first perform a sequence of motions to rotate the object to be roughly in the right orientation before lifting it to the goal position. For details regarding their implementations, please refer to [29] , [5] , [11] , [3] . Further [15] contains a more detailed description of the solutions of some of the teams. The dataset contains the recorded data of all jobs that were executed during phases 2 and 3 of the challenge. Combined with the runs from the weekly evaluation rounds, this results in 2856 episodes of phase 2 and 7422 episodes of phase 3. The data of each episode can be downloaded individually. The compressed file size of one episode is about 250 MB. We expect that users of the dataset will typically not use all episodes (which would be a large amount of data), but select those that are interesting for their project. with this, we provide a database containing the metadata and metrics of all episodes. This allows to filter the data before downloading. The dataset itself, the tools for filtering the episodes as well as a more technical description of the data format can be found at https://people.tuebingen.mpg.de/ mpi-is-software/data/rrc2020. 1) Data Description: For each episode, the following information is stored: • all robot and camera observations (including object pose information) as well as all actions (see Section III-A) • camera calibration parameters • the goal pose that was used in this episode • metadata such as timestamp, challenge phase, etc. • several metrics that can be used to filter the dataset: cumulative reward baseline reward obtained if the object were not moved during the whole episode initial distance to goal closest distance to goal throughout the episode maximum height of the object throughout the episode largest distance to initial position throughout the episode Any user-specific information, such as names or custom output of the user code, are not included in the dataset. 2) Data Format: The data is provided in two different formats: 1) The original log files as they were generated by the robot. This includes files in a custom binary format that requires the use of our software to read them (available for C++ and Python). 2) Zarr 4 storages created from the original files. This can be easily read in Python using the Zarr package. For a more technical description please refer to the dataset website. We designed and built a robot cluster to facilitate reproducible research into dexterous robotic manipulation. We believe that this cluster can greatly enhance coordination and collaboration between researchers all across the world. We hosted competitions on these robot to advance state-of-theart and to produce publicly-available datasets that contain rich interactions between the robots and external objects. In addition, we provide scientists with access to the platforms for their own research project. This is especially beneficial for researchers which would otherwise not have access to such robots. Robotics benchmarks for learning with low-cost robots Transferring dexterous manipulation from gpu simulation to a remote real-world trifinger Competitions for Benchmarking: Task and Functionality Scoring Complete Performance Assessment. IEEE robotics & automation magazine Real robot challenge phase 2: Manipulating objects using high-level coordination of motion primitives Robot competitions-ideal benchmarks for robotics research Toward Replicable and Measurable Robotics Research Benchmarking in Manipulation Research: Using the Yale-CMU-Berkeley Object and Model Set Benchmarking in Manipulation Research: The YCB Object and Model Set and Benchmarking Protocols Dexterous manipulation primitives for the real robot challenge Pybullet, a python module for physics simulation for games, robotics and machine learning Benchmarking deep reinforcement learning for continuous control Addressing Function Approximation Error in Actor-Critic Methods Benchmarking structured policies and policy optimization for real-world dexterous object manipulation An Open Torque-Controlled Modular Robot Architecture for Legged Locomotion Research Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor Emergence of Locomotion Behaviours in Rich Environments Deep reinforcement learning that matters Asynchronous Methods for Deep Reinforcement Learning PyRobot: An Open-source Robotics Framework for Research and Benchmarking Duckietown: An open, inexpensive and flexible platform for autonomy education and research The robotarium: A remotely accessible swarm robotics research testbed Data-efficient Deep Reinforcement Learning for Dexterous Manipulation Mujoco: A physics engine for model-based control TriFinger: An Open-Source Robot for Learning Dexterity Replab: A reproducible low-cost arm benchmark platform for robotic learning Grasp and motion planning for dexterous manipulation for the real robot challenge Extending the openai gym for robotics: a toolkit for reinforcement learning using ros and gazebo