key: cord-0631223-f9m3e31q
authors: Ortenzi, Valerio; Cosgun, Akansel; Pardi, Tommaso; Chan, Wesley; Croft, Elizabeth; Kulic, Dana
title: Object Handovers: a Review for Robotics
date: 2020-07-25
journal: nan
DOI: nan
sha: d8f95703909d8913e53b910627514b77006a7df1
doc_id: 631223
cord_uid: f9m3e31q

This article surveys the literature on human-robot object handovers. A handover is a collaborative joint action where an agent, the giver, gives an object to another agent, the receiver. The physical exchange starts when the receiver first contacts the object held by the giver and ends when the giver fully releases the object to the receiver. However, important cognitive and physical processes begin before the physical exchange, including initiating implicit agreement with respect to the location and timing of the exchange. From this perspective, we structure our review into the two main phases delimited by the aforementioned events: 1) a pre-handover phase, and 2) the physical exchange. We focus our analysis on the two actors (giver and receiver) and report the state of the art of robotic givers (robot-to-human handovers) and the robotic receivers (human-to-robot handovers). We report a comprehensive list of qualitative and quantitative metrics commonly used to assess the interaction. While focusing our review on the cognitive level (e.g., prediction, perception, motion planning, learning) and the physical level (e.g., motion, grasping, grip release) of the handover, we briefly discuss also the concepts of safety, social context, and ergonomics. We compare the behaviours displayed during human-to-human handovers to the state of the art of robotic assistants, and identify the major areas of improvement for robotic assistants to reach performance comparable to human interactions. Finally, we propose a minimal set of metrics that should be used in order to enable a fair comparison among the approaches.

R ECENT years have witnessed a progression towards a more direct collaboration between humans and robots. The current trend of Industry 4.0 envisions completely shared environments, where robots act on, and interact with, their surroundings and other agents such as human workers and robots [1] , [2] , enabled by technological advances in robot hardware [3] . The recent COVID-19 pandemic has increased the demand for autonomous and collaborative robotics in environments such as care homes and hospitals [4] , [5] . Accordingly, Human Robot Interaction (HRI) is featured prominently in the robotics roadmaps of Europe, Australia, Japan and the US [6] - [9] . The advantages of human-robot teams are multifaceted and include the better deployment of workers to focus on high manipulation and cognitive skill tasks, while transferring repetitive, low skill, and ergonomically unfavourable tasks to robot assistants. Effective deployment of robotic assistants can improve both the work quality and the experience of human workers.

The structured nature of traditional industrial settings has facilitated the use of robots in work cells. However, a similarly successful presence of robots is yet to occur in unstructured environments (i.e., in factories without work cells, in households, in hospitals). For such environments, robots need a better understanding of the tasks to perform, a robust perception system to detect and track changes in the surrounding dynamic environment and smart, adaptive action and motion planning that accounts for the changes in the environment [10] .

Human-robot collaboration and human-robot interaction are frequent keywords in our research community. We refer the reader to [3] , [11] for reviews on physical collaboration and to [12] , [13] for an overview of the cognitive aspects. Our community has seen an increasing focus on collaborative manipulation tasks [14] - [16] . In this context, robots must be capable of exchanging objects for successful cooperation and collaboration in manipulation tasks, as in Fig. 1 . For example, consider an assembly task where a human operator has to assemble a complex piece of furniture and requires a tool. The robot assistant should be able to fetch and pass the tool to the human operator. Or consider a service robot handing out flyers to passersby [17] or serving drinks [18] . A further example can be a mechanic asking for a tool while under a car: in this scenario, the motion range of the mechanic is extremely limited and extra care is needed to pass the tool [19] .

The action of passing objects is usually referred to as an object handover. More formally, an object handover is defined as the joint action of a giver transferring an object to a receiver. This frequent collaborative action among humans requires a concerted effort of prediction, perception, action, learning, and adjustment by both parties. The implementation of a human-robot handover that is as efficient and fluent as the exchanges among humans is an open challenge for our community. In this paper, we review the state of the art of robotic object handovers. In particular, we investigate the aspects of the handover interaction that require the most effort to enable a more useful and successful collaboration with robots, particularly in unstructured environments.

We start this paper with a review of the main findings arXiv:2007.12952v2 [cs.RO] 7 Jan 2022 about human-human handovers in Section II. Then, in the following two sections, we refer to each of the two phases of a handover: pre-handover, and physical handover. In Section III we focus on the reasoning and actions of the giver and receiver before the physical exchange of the object, analysing aspects such as communication, grasping, and motion planning and control. Section IV describes the physical exchange of the object, focusing on aspects such as grip modulation. Section V analyses safety in preparation and during an object exchange. Section VI reports a comprehensive list of quantitative and qualitative metrics that are commonly used for assessing handovers. We conclude this review with a discussion identifying open challenges and directions for future work in Section VII. We further propose a minimal set of metrics to adopt in experimental protocols in order to enable a fair comparison among the different approaches.

Formally, a handover is a joint action between a human giver and a human receiver. Joint actions are defined as [20] any form of social interaction whereby two or more individuals coordinate their actions in space and time to bring about a change in the environment . . . successful joint action depends on the abilities (i) to share representations, (ii) to predict actions, and (iii) to integrate predicted effects of own and others' actions. Joint actions are typically more complicated than individual actions. Social context is shown to modify the plans of actions of an agent [21] . While there is still much to understand and learn about how humans coordinate to meet their final goals, a number of scientific results shed some light on how humans behave during such actions. A minimal architecture for a joint action should include representations, processes like monitoring (feedback) and prediction (feedforward), and coordination [22] . Humans tend to form representations of their own goals and tasks, and potentially also of their partners' goals and tasks. Then, two processes use those representations: monitoring and prediction. Monitoring is a process to check the advancement of those tasks and goals. Such feedback can be on one's own task, on the task of the other agent [23] and on the overall goal. Predicting the outcome of one's own actions and possibly, the other agent's actions, helps the coordination between the agents. Agents are interested in predicting: the what, i.e., the actions of the other and their goal; the when, i.e., the temporal coordination [24] ; and the where, i.e., the spatial distribution of common space [25] . Shared representations help to predict the other's actions and achieve higher coordination, integrating the what, when and where. Coordination is also increased through joint attention (thus sharing perceptual inputs) [20] . In particular, research has shown that there seem to be similar eye motor programs when performing and observing the same scene [26] , [27] , thus reinforcing the link between perception and action.

More recently, a Dyadic Motor Plan was proposed in [28] . This plan highlights the possibility that joint actions are based not only on active prediction of the actions of the partner, but also prediction of the effects of the actions of the partner, in a deeper effort of prediction.

To summarise, during joint actions humans tend to plan their motions considering the partner's needs and representing and predicting the partner's actions and their outcomes [28] , [29] . For this reason, scientists argue that humans form shared representations of the task to better predict each other's movements and to act accordingly [30] . Efficiency and social cohesion are also listed as reasons to adopt such shared representations.

Coordination is extremely important for the success of a joint action. There are two types of coordination [31] : planned and emergent. Planned coordination emerges from the representations of the desired outcomes and one's own tasks and goals. Emergent coordination is independent of joint plans, and emerges from perception-action couplings. Considering these two types of coordination mechanisms, a joint action such as a handover requires the synergetic harmony of planned coordination for the final goal, and emergent coordination for the real-time aspects of the interaction.

From this perspective, an object handover is a joint action where two agents collaborate to accomplish the transition of the object from one agent, referred to as giver, to a second agent, referred to as receiver. While the two agents share the overall goal of the object transfer, the objectives of the two agents differ during the interaction [32] . The giver aims to: most appropriately present the object to the partner; hold the object stably till the completion of the physical handover; and finally, release the object to the receiver as safely as possible. Conversely, the receiver aims to: acquire the object by grasping; stabilise the grasp on the object; and finally, following the handover, perform the task the object was required for. It is crucial to remember that, in most cases, the object is passed in order to have the receiver perform a certain task. This task might be as simple as to place the object on a table (thus imposing loose constraints on the use of the object); or it might be more complicated, such as turning a key in a keyhole or cutting a piece of paper with a pair of scissors. While these tasks are frequently actualised in our everyday life, they require an appropriate utilisation of the object, i.e., they impose severe constraints on the use of the object. The giver should consider the subsequent task that the receiver would perform with the handed over object, in order to facilitate the task of the receiver [33] .

A handover can be divided into two phases [32] , [34] , [35] . We use the tactile events, control discontinuities, and transitions that characterise any manipulation, to detail each phase [34] . Pre-handover phase includes the explicit and implicit communication between agents, as well as the grasping and transport of the object by the giver. The first contact of the receiver's hand on the object begins the physical handover. This phase comes to an end when the giver removes their hand from the object and the object is fully in the hold of the receiver. Therefore, we divide a handover into two phases: a pre-handover phase, and the physical handover phase. During these phases, the agents display different levels of activity, with respect to their own tasks and objectives, Fig. 2 .

Two conditions define the start and the end of a handover. A handover can be initiated by the need of an agent to obtain an object to perform a certain task (handover by object request). This agent becomes the receiver and requests the object from the giver. The mechanic under a car asking for a tool is a typical example of this type of initiation. Another example is a cook that asks the sous chef for a kitchen tool. Alternatively, a handover can be initiated by an agent asking another to perform a certain task with an object (handover by task request). This agent becomes the giver and gives the object to the receiver. For example, while tidying up a room, an agent can pass an object to another agent in order for the latter to place the object in a certain location; another example is a chef asking the sous chef to stir some sauce on a pan by offering the appropriate kitchen tool.

Once the exchange is initiated, the giver offers the object to the receiver. The physical exchange of the object can be direct or indirect. The object is passed from the hand of the giver to the hand of the receiver during a direct handover. In a number of situations, as in the example of the mechanic located under a car asking for a tool, a direct handover is also the most immediate solution to pass the requested tool. Alternatively, the object might be placed by the giver on a surface, e.g., on a table, during an indirect handover. Indirect handovers allow a greater flexibility to the receiver in terms of the timing and of the grasp used to obtain the object. However, direct handovers can reduce the effort of the receiver in terms of motions required to obtain the object [36] . In this paper, we focus on direct handovers because almost all the works in the robotics field belong to this kind of exchange type.

The physical phase of the handover terminates when the receiver has fully obtained the control of the object. At this stage, the receiver progresses to performing the task that initiated the handover.

The next two sections focus on action and cognition during each of the two phases of the exchange: the pre-handover, and the physical handover phases. In particular, we will bring attention to aspects such as motion planning and control, prediction and communication, object grasping and offering, and modulation of grip forces.

As we discussed in the previous section, a handover is initiated either by the request for an object or by the request for a task. In both circumstances, the request for an object or the request for the task must be communicated to the other agent. Communication is a foundation for every joint action, and it can occur in various manners. Humans display a wide array of communication skills to help coordinate the what, when and where of a handover. Gaze, pose and oral cues are common ways for agents to communicate during this phase. Communication does not happen only directly, as in for example voicing the intent to pass an object; but also during the action, e.g., in motions or gestures during the pre-handover phase, where a giver clearly displays their intent to hand over the object. Similarly, the way an object is grasped and offered often presents cues on the intent to hand over.

Once initiated, a handover enters the preparation phase that leads to the physical exchange of the object. In preparing to offer the object, the giver predicts how the receiver would perform the task the object is being passed for, and given these predictions, how the receiver would want to grasp the object. Using these predictions, the giver plans motions to obtain (grasp) the object if not yet grasped, or (if needed) to re-grasp the object to best prepare for the exchange, and then to offer it to the receiver. The giver relies on visual and tactile feedback to perceive and track the object as well as the state of the receiver, i.e., both the position and whether they are ready to receive. During this time, communication signals are constantly exchanged between giver and receiver. The giver then uses this sensed feedback to adjust their motion plans, coupling this feedback with updated predictions of the receiver's behaviour. These updates and adaptations aided by prediction, perception and learning are used to control the motions realised to grasp the object and to offer the object to the receiver.

The receiver shows lower activity in this phase. However, the receiver's actions and communication are perceived by the giver, therefore influencing the giver's actions. Attention and state of preparedness of the receiver are important as they communicate the readiness to receive. Similar to the giver, the receiver also predicts the behaviour of the giver, and forms a plan of action. The receiver may move their hand towards the predicted handover location in anticipation of the handover. The receiver's plan and actions are updated using sensor feedback such as vision, touching and hearing. The receiver's plan and actions are also dependent on the subsequent task that the receiver would perform with the object. At the end of the pre-handover phase, the receiver has reached for and made contact with the object.

Communication is crucial in any joint action. Signalling strategies (i.e., communication) aid coordination by improving the partner's prediction of one's actions (thus minimising uncertainty) [37] . In particular, communication is used to initiate the action, i.e., to show the intent to start with the action; and then to coordinate the action once it has started [22] . Humans are extremely skilful in communicating their intent (the what, i.e., the action to perform and the object to pass) and expressing cues about the when and where of a handover [38] . Communication is so important that a handover can be thought of as a physical process (approach, reach, transfer) and a cognitive process to establish what, when and where to pass [38] . These findings indicate that robots also require such communication skills and adaptation capacity in order to match human performance during interaction with a human partner. Speech 1 can be used to express the intent to hand over an object as well as to coordinate the actions during the exchange. Speech can be used to initiate the action by either one or both 1 Interestingly, there is evidence for the embodiment of language, i.e., that the motor system is activated during the comprehension of the language [39] . Moreover, there is further evidence of the involvement of the motor system in processing action words such as "kick", "pick". However, it is not clear yet if this activation is due to the real processing of the action words or rather it is a by-product of imagining the action [40] . of the agents, and language use could be considered as a form of joint action per se [41] . However, the use of speech can also degrade coordination during a joint action when the partners' attention is divided between multiple modalities of sensory communication (visual and auditory in [42] ). Similarly to human-human conversation, in HRI a robot and a human could have a dialogue to decide their roles during an interaction, and then to coordinate actions [43] .

Gaze is also a very powerful tool for communicating the intent to act and for coordinating the action. Gaze is the ensemble of eyes, head, and body orientation that reacts to the joint action [44] . Human gaze supports the planning of actions of object manipulation, spotting positions (contact points) to which to direct a grasp [45] . Furthermore, there seems to be a link between action perception and execution. In other words, humans are able to read other people's action intentions by observing their gaze [46] . Analysing implementations of gaze in human-robot handovers, it is not surprising that during a handover, the use of gaze by a robot positively impacts the interaction, resulting in faster object reaching and a more natural perception of the interaction by the human receivers [47] - [49] . Similarly, gaze can have an effect on cooperation also in terms of faster human response times [50] . Interestingly, a deliberate delay in releasing the object by the robot results in an increase of attention to the robot's head, and also an increase of the compliance with the robot's suggestions (actualised with the robot's head motions) [51] . A closely related concept is turntaking, which helps humans communicate their understanding and control of the turn structure to a conversation partner by using speech, eye gaze, and body language. Turn-taking has been explored in human-robot interaction [52] ; it can be beneficial for handovers in both directions, robot-to-human and human-to-robot.

In addition to speech and gaze, humans use a number of other ways such as body stance and position, arm pose, and gestures (with arm and/or hand) to communicate their intent to hand over an object and when/where the handover will take place. The presentation of an object, such as an extended arm and offering the object such that the free part is towards the receiver and tilting the object towards the receiver, are configurations that convey intent to pass an object [53] , [54] . Cakmak et al. [54] claim that such anticipation in the behaviour of the agents makes the interactions more fluent.

In the robotics community, some aspects of such communication methods have been investigated. An analysis of kinematic features could lead to an automatic detection of the intent to hand over an object, for example using machine learning classifiers [55] . A learning-based approach presented in [56] posits that the orientation of a person and joint attention (on the object or on the position where the handover will happen) are important cues for physical interaction. Similarly, statistical models were used to model the physical aspects of a handover, and endowed with a higher-level cognitive layer that uses non-verbal cues (head orientation) to better understand the intent of a human receiver to grasp an object [57] .

Alterations of more common movements and arm trajectories can also be used by humans to communicate during joint actions. Trajectories of motion can be altered in order to communicate to one's partner [37] . Taking this to the extreme, some movements can be coordinated in order to mislead one's opponent in a competitive joint action, e.g., a footballer's feint move [58] . Similarly, robots can devise deceptive motions too [59] . Moreover, the initial pose of a robot receiver can inform the human giver about the geometry of a handover [60] .

Recently, projection methods have been used for communicating the robot's intent to humans. Visualising the object pose and robot's intended grasp pose for human-to-robot handovers is shown to substantially improve the subjective experience of the users [61] .

We have previously considered that during a handover, the giver plans their motions considering the task of the receiver. In particular, the giver considers how to grasp the object so as to offer it to the receiver in the best way possible, e.g., whenever possible, to minimise object manipulation by the receiver before using the object for its intended use [62] . This is an example of second-order planning for object manipulation, which is defined as:

... altering one's object manipulation behaviour not just on the basis of immediate task demands but also on the basis of the next task to be performed [63] . If the planning takes into account more than two steps, then it is termed higher-order planning. In the case of a handover, the grasp of the giver could also account for the task to be performed by the receiver [33] . In effect, the grasp of the giver influences the grasp of the receiver, as the latter can only grasp the object on the unencumbered portion of the object. The grasp choice of the giver can influence whether a receiver can directly use the object for their task or must re-manipulate the object to be able to use it.

The grasping adaptation performed by the human giver is in line with theories that consider grasping an inherently task-oriented or purposive action in humans [64] - [66] , that involves both sensory and motor control systems [67] , [68] . A human study shows that when participants took hold of a vertical cylinder to move it to a new position, grasp heights on the cylinder were inversely related to the height of the target position [69] , which is a clear example of adaptation of the grasp to the task. There is further evidence that the reaching movement of the arm and the grasping movement of the fingers may also be influenced by the grasper's goal [70] - [73] . From this perspective, it is not surprising that the intention to cooperate influences the grasp choice during an interaction like an object handover. As already established, givers do reason about how to grasp the object and where to place their hand on the object. Givers consider which area of the object relate to the receiver's subsequent task and adapt their grasp strategy accordingly. Indeed, when the task of the receiver has fewer performance constraints, i.e., when the task of the receiver is as simple as placing the object on a table, there are less stringent constraints to perform the task and thus the exchange of the object can be more relaxed [74] . However, when the task of the receiver requires the use of the object in a very specific way (i.e., cutting a sheet of paper with a pair of scissors), then the grasp of the giver usually accounts for the constraints of the task of the receiver [74] . Similarly to the considerations about the grasp of the giver, different tasks and objects elicit different levels of constraints on the grasp that the receiver has to use.

Humans display a wide range of grasps, and several taxonomies have been proposed to categorise human grasps based on specific aspects such as hand shape on the object, contact points, and pressure [75] - [77] . Humans choose their grasp considering many factors [75] , [78] - [80] : object constraints (e.g., shape, size, and function), gripper constraints (e.g., the human hand or gripper kinematics and the hand or gripper size relative to the object to be grasped), habits of the grasper (e.g., experience and social convention), and environmental factors (e.g., the initial position of the object and environmental constraints [81] ).

For all these reasons, factors such as object shape, object function and safety are important to consider when planning a grasp for a human-robot handover [82] . In a human user study, it was shown that when participants are handing over objects to each other, they tend to orient some objects differently when they were explicitly asked to consider the presentation that is most convenient to the receiver [83] . Similarly, object constraints and the receiver's task are highlighted to be key factors in the choice of grasp by the giver [74] . In particular, grasp type and grasp location change to facilitate the grasp of the receiver on the object. Similar reasoning was already adopted for robot to human handovers in [84] , [85] . However, the robotic giver acted knowing a priori the 'appropriate' parts of the objects and the human receiver did not have to perform any subsequent task with the objects. Learning by demonstration was proposed by the same authors as a possible method to further explore the semantic segmentation of objects for grasping [86] . Similar to this work, learning handover grasp configurations through observation of human behaviour has been shown to be a viable solution [87] . Using the concept of affordance axis, a method has been proposed for selecting good handover observation sets to learn grasp configurations [88] ; however, while this works well with objects with one main grasp configuration, it is a more challenging problem when the object can be presented in multiple orientations, as the robot needs to see a larger set of possible configurations and then decide which is best in a given situation.

While a successful robot grasp is usually characterised by stability [89] and/or speed [90] , one aspect of robotic grasping that is often overlooked is the task to perform [91] and its requirements in terms of force and mobility [92] . Findings in [93] suggest that a grasping strategy by a robot that accounts for the subsequent task of the human receiver improves substantially the performance of the human receiver in executing the following task, reducing the time to complete the task by eliminating post-handover re-adjustments of the object. Moreover, human perceptions of the interaction improve especially when the constraints induced by the object's functional parts become more restrictive.

A planner for interactive manipulation tasks between robots could potentially account for both the grasp of the robotic giver and the grasp of the robotic receiver, thus enabling both robots to grasp successfully [94] . This approach is hardly extendable to human-robot handovers, as the human behaviour is more difficult to model with certainty. To overcome this problem, one option is to probabilistically model the behaviour of the human receiver, accounting for the ergonomic cost of the receiver, and thus influencing the grasp of the receiver [95] .

Finally, the functionality of the object being handed over is an important consideration [96] - [102] . Gibson [96] coined the term "affordances" to define the possibilities for action offered by objects and their environment. Norman [103] added a perceptual dimension to the concept of affordance, associating it not only to the agent's capabilities, but also to their tasks to perform. However, a clear functional part of an object, such as a handle of a screwdriver, can elicit different behaviours in single-agent scenarios and cooperative tasks [74] , [104] , [105] . For example, a single agent having to tighten a screw will grasp a screwdriver from the handle, whereas a giver wanting to hand over the screwdriver, should grasp it from the metal rod, thus offering the handle to the receiver. While this adaptation is natural to humans (having developed it through understanding and the repetitive use of the object), such understanding is still to be achieved in robots. A concerted effort in perception and action [106] is needed in order to endow robots with such capabilities. Learning from human demonstration and learning about physical properties of the objects that afford specific actions seem very promising approaches [107] - [112] . An optimisation-based approach over affordances, task to perform and mobility constraints of the human receiver is presented in [113] .

A big challenge in human-robot handovers is a reliable perception of the object, the hand (self and partner) and the partner's full-body motion. In this phase, vision is commonly used as the main perception channel. Some approaches try to track object and hand to plan for the grasp [114] - [116] , leveraging large datasets for training and physical relationships between hand and object. While grasping, the hand and objects can become severely occluded, thus harder to track with vision sensors. Alternatively, this problem can be addressed as a grasp classification problem [117] , in which common human grasps for the task of human-robot handover are divided into categories such as "waiting" or "lifting", inspired by the human grasp taxonomy [77] . The grasp class information can then be used by a planner to devise the most appropriate approach and grasp strategy for the robot receiver. However, the classification of grasps suffers the drawback of detecting only a relatively small subset of grasps, thus failing to detect the richness of behaviours displayed by humans. The human body can also be tracked in addition to the object and the human hand in order to improve safety [118] . A real-time implementation of grasp planning and re-planning for H2R handovers based on vision can be found in [119] .

While the perception of the human partner's hand and body is critical real-time feedback, there have been efforts also in predicting the human partner's motion. Dynamic Movement Primitives (DMPs) [120] , [121] have been used successfully to predict human motion (point attractor and time scale, which mean handover location and time), coupled with an Extended Kalman Filter [122] . Real-time estimation of human motion can also leverage the concept of minimum jerk trajectories [123] . The minimum jerk model can be used in conjunction with regressors to predict when and where a human giver will transfer an object [124] . The minimum jerk model is used with a Semi-Adaptable Neural Network to predict human arm motion in [125] . Gaussian Processes can also be used for proactively estimating human motion for handovers [126] .

Luo et al. [127] propose a 2-layer framework using Gaussian Mixture Models and Gaussian Mixture Regressor to represent and predict human reaching motions.

The handover must occur in a location that is reachable by both agents. Thereby, an aspect that deserves thorough analysis is the handover location. Human-human handovers have been shown to occur roughly midway between giver and receiver [128] . Thus, the interpersonal distance between the agents has a fundamental influence on the location of the handover, and on the height of the point of exchange [129] . Conversely, the object mass seems not to affect the location of the exchange, but rather the duration of the exchange.

Leveraging on this notion for HRI, a task-specific interaction workspace can be built as the intersection of the spaces that can be accessed by robot and human [130] . Information such as the effort needed by the human to reach a certain location can be used in an on-line manner to shape the interaction workspace, in order to plan the robot's movements. Similarly, handover locations can account for biomechanical properties of the human receiver, such as height, weight, strength and range of motion [131] . These considerations of the biomechanical properties of the human partners are especially critical when there are environmental or task constraints to limit the motion of the human (like in the case of the mechanic under the car) and when the human is motor-impaired. Furthermore, optimising the robot's motions over safety, acceptability and task constraints could help improve the posture of the human receiver [132] , thus decreasing the chances of musculoskeletal disorders and discomfort [133] . The human mobility could also be accounted for while planning, to devise different paths for the robot to the handover location [134] , [135] . Incorporating models of the kinematics and the dynamics of the body of the human receiver can effectively devise handover locations that are more acceptable to the human partner [136] . Finally, the human arm manipulability could also be embedded in an optimisation framework to reduce muscular strain [137] , [138] .

During a joint action, the movements of the agents simultaneously actualise the physical joint action and signal important information for the coordination. Movements during humanhuman handovers are generally smooth rather than being separate and successive phases [139] . For example, receivers usually start the reaching movement toward the givers while the giver reaches out for the receiver (in a concurrent motion), as implemented in [140] , [141] . As such, the dominant aspects of successful movements in the context of a joint action like a human-robot handover are: legibility, predictability, safety, robustness, reactivity, and context awareness. We will cover safety specifically in Sect. V.

1) Legibility and Predictability: Legibility and predictability relate to how easy it is for one agent to understand and predict the other agent's movements. Albeit similar, legibility and predictability are not synonyms [142] . Using a psychological interpretation of actions, legibility is a characteristic of motion that enables an observer to infer the goal (actionto-goal). On the other hand, predictability is a characteristic of motion that matches what an observer expects given the knowledge of the goal (goal-to-action). By this definition, motions of collaborative robots must be legible, thus allowing the partner to quickly and reliably predict the goal of the actions of the robot. Interestingly, humans prefer robot configurations that are more natural or human-like as they are more readable [143] . Inverse kinematics algorithms mapping Cartesian motions to the robot's joint space can also aim at devising overall movements for the robot that are legible to the human partner [144] , [145] .

2) Robustness, Reactivity and Context Awareness: The robot's motions should be flexible to accommodate changes in the environment, and to accommodate behaviours of different partners, [146] , [147] . To this end, principles such as robustness, reactivity, and context awareness should guide the design of human-robot interaction systems [148] . From this perspective, a fully pre-planned motion falls short of general adaptability. In other words, a fully deterministic approach to planning is only possible if the environment is fully known, as in the case of robot-to-robot handovers [149] . Instead, a mixture of planned motions and control over sensory feedback aids to modify the motions and adapt to the partner. A switching planning mechanism that mixes global and local planning can help to overcome the drawbacks of fixed planned motions [150] . Fast responsiveness of the robot giver is particularly important as it increases the positive impression of the interaction [19] . Interestingly, a human study suggests that the speed of the interaction might be more important than the spatial accuracy of the robot for the subjective experience of a human receiver [151] . When the robot acts as a receiver, adaptive reaching displays better performance compared to a fully pre-planned reaching motion in terms of predictability and aggressiveness, [152] . Humans adapt their actions to account for the workload of their partner [153] . Similarly, a robot should be aware of the task status [154] . For example, a more proactive robot giver could increase the speed of the handovers, negatively impacting the user experience. On the contrary, coordinating a reactive robot could be perceived as a better user experience, even if the performance deteriorates [153] . A lower speed motion also decreases the stress induced on the human receiver [155] . An attempt at combining trajectories planned in the Cartesian space (emulating human movements) and joint limits was presented in [156] .

While pure planning usually devises a feedforward trajectory to follow, control architectures provide the means to use sensorial feedback and change the behaviours of the robot. Impedance control and admittance control are two common strategies to use in physical human-robot interaction [157] - [159] . Variants of classical approaches include using redundancy and null space [160] , [161] , modelling the interactive forces [162] and parameter adaptation [163] . Early work on control proposes to use fuzzy logic on three aspects: relevance, confidence and effect [164] . Human-human handovers show a smooth and fluid continuum of motion. For this reason, rather than switching control paradigms between handover phases, a phaseless controller (no distinction between reaching, passing and retracting) could be based on insights about the human behaviour, e.g., existence of motion during the passing and existence of coupling between the movements of the giver and those of the receiver [141] . However, one specific implementation of such a controller in [141] assumes that the object mass is known, in order to best modulate the grip forces. Alternatively, a controller could use highlevel desired behaviours (such as proactivity or timings) using Signal Temporal Logic, as in [165] .

Dynamic Movement Primitives (DMPs) [120] , [121] represent an alternative to both pure feedforward and pure feedback control during an interaction. To specifically target a handover, the feedforward part can be weighted more at the start of the motion (shape-attraction), and subsequently the feedback (goal-attraction) can be weighted more as the interaction nears the physical exchange of the object [166] . In order to generate a wider range of behaviours during interactions, Interaction Primitives (IPs) build on the framework of DMPs and maintain a distribution over their parameters [167] . Probabilistic motion primitives [168] are shown to allow a robot to recognise human intent (task) and at the same time, generate commands for a robot according to the observed human motions, achieving coordination [169] . In this way, planning is replaced by inference on the probabilistic model. Learning from human feedback might also improve the adaptability of handovers. For example, in a contextual policy search, a robot could learn a reward function from human preference feedback [170] . Alternatively, GMMs and mirroring are proposed in [171] .

This phase encompasses the physical interaction between giver and receiver and the object transfer. During this phase, both players are physically and cognitively engaged. Entering this phase, the giver possesses the object thus controls its stability. After the occurrence of the physical contact, the giver can couple vision and force feedback to understand to which extent the receiver has grasped the object. At this point, the giver starts releasing the object in order to allow the full transition of the object to the receiver. The timing must be coordinated as an early release can cause the object to fall; and a late release can cause higher interaction forces [172] .

The receiver approaches this phase by planning a grasp on the object given the visual feedback from the actions of the giver. Given the presentation of the object, the receiver then acts and places the hand on the object to maximise the stability of the grasp and also in the most appropriate way to be able to perform their task afterwards. The transition ends when the giver entirely releases the object to the receiver, who then acquires the object in full. In this phase, the success of a handover is dictated by the coordination of the when and where of the joint action. For this reason, the most crucial aspect of this phase is the modulation of the grip force to complete a safe transfer of the object. If during the prehandover phase the main avenue of perception is vision, during the physical exchange force sensors are generally used to perceive the contacts. Other modalities of perception include tactile sensing, optical force sensing and vision.

In line with literature in neuroscience and psychology, the joint action of the physical handover is an interplay of anticipatory control and somatosensory feedback control [32] . Visual feedback augments the anticipatory control in starting the release of the object, by predicting and detecting the collision created by the hand of the receiver on the object [173] . Visual feedback is also used to adapt predictions to different speeds of the receiver's reaching out movements. From this perspective, the speed of the grip force release seems to be correlated with the reaching velocity of the receiver (i.e., the faster the approach, the faster the giver releases the object) [173] . Giver and receiver show similar strategies for controlling their grip forces with respect to the evolution of the load forces generated by the object and the exchange. All of these findings point to the fact that the giver is in charge of the safety of the object, while the receiver modulates the efficiency of the object exchange [32] , [172] . During a human-to-robot handover forces arising during the release are different when a robot acts as receiver. In fact, the faster the retraction of the robot after grasping the object (still in the partial hold of the human giver), the larger the interaction forces. This might be explained as the giver does not have enough time to withdraw [60] .

The task of the giver is shown to resemble the evolution of a picking up task [172] , [174] in that the giver, like the picker, typically will use excess grip force to ensure that the object does not slip or drop. Moreover, in [172] a linear relationship between grip force and load force is observed, except when either actor is supporting very little of the object load [172] . An analysis of these grip forces reinforces the idea that the giver is responsible for the safety of the object during the transfer, while the receiver is responsible for the timing of the transfer. A release control strategy for a robot using these insights was presented in [174] . The same control strategy can also be applied to an under-actuated hand, using linear models leveraging force readings from the elbow of the robot [175] . Moreover, the feedback from a force sensor mounted on the robot's wrist can be robustly used to modulate the release of an object [176] . Moreover, it was shown that a proactive release improves the fluency and the subjective perception of a handover with respect to a fixed release strategy [177] .

Another task for both the giver and the receiver is the handling of errors and disturbances during the handover. There might be cases where the receiver makes unwanted contact with the object and the giver should not release the object. The contact forces exerted by the receiver should then be recognised as disturbances and should be compensated to maintain a stable grasp on the object. In human-robot handovers, the tactile information from a Shadow Robot hand is used in [178] to build probabilistic models to detect these disturbances and feed them back to an effort controller. Machine learning can also be used to disambiguate among pulls, pushes, inadvertent collisions and holds performed by a human receiver on an object still in the robot's hold, as in [179] . Another threat to safety is a potential fall of the object. It has been found that human givers tend to primarily rely on vision rather than haptic sensing to detect the fall of the object during handovers [180] . Thus, the object acceleration measured with an optical sensor at the gripper can be used as an indicator of handover failure (object dropping) [181] . Recently, force control and fuzzy control were similarly used [182] , [183] .

Safety is a pivotal topic in human-robot interaction [184] . In the context of a robot handover, safety is a multi-faceted concept that prioritises the physical safety of the human partner, but includes also the safe transfer of the object and the safety of the robot itself. Safety can be ensured (or achieved) through software and/or hardware [185] , [186] . Research 2 has led to the standard ISO/TS 15066:2016 that regulates collaborative robots and contains the norms of appropriate behaviour during physical human-robot interaction.

The safe planning of motions while approaching a human partner is a critical aspect during a joint action. Motion planning and control can be framed to explicitly minimise safety risk during the interaction. For instance, in [187] the robot is kept in low inertia configurations in case of unanticipated collisions; moreover, the chance of collision is reduced by distancing the robot's centre of mass from the human. Similarly, a metric of distance from the operator is used in the optimisation in [188] , and safety barrier functions are built around the robot links to allow collision-free planning [189] . Similarly, a safety index is used in planning augmented by human motion prediction in [190] . Motion planning should devise safe, reliable, effective and socially acceptable motions [191] , [192] . Frontal approach versus lateral approach by the robot towards the human receiver is discussed with some contrast in [193] , [194] . Such considerations are further used to develop the planner in [191] , which is composed of three components: spatial reasoning to account for the human receiver (perspective placement [195] ), path planning optimising over costs that account for safety, visibility and human arm comfort (humanaware manipulation planner [196] ), and trajectory control to ensure minimum-jerk motions at the end effector (soft motion trajectory planning [197] ). Humans minimise jerk in order to realise well-behaved trajectories for arm movements [198] . Minimum-jerk motions by a robotic giver also result in shorter reaction time and faster adaptation for human receivers [199] . Further, to better match the human trajectories of minimum jerk, a decoupled minimum jerk trajectory could be used, using different time constants in the gravity axis z (thus decoupling the motion in the x-y plane to the motion in the z axis) [200] .

The two paradigms R2H and H2R involve different aspects of safety. In the H2R paradigm, the robot aims to make contact and grasp only the object, avoiding any contact with the human partner. This is usually achieved leveraging vision, e.g., [117] - [119] . In order to avoid any contact with the human partner, most of the approaches are over-conservative in the attempt to compensate for potential noise in the perception (e.g., building enlarged bounding boxes around the hands and body of the human partner, thus allowing a greater distance between robot and human).

Another aspect that is critical to safety is the grasping/pulling force exerted on the object, and its timing. An erroneous timing and/or a too high/low pulling force could generate highly unsafe behaviours such as: pulling the human partner along with the object, or allowing the object to drop [172] , [174] . In the R2H paradigm, the robot must (i) approach the human safely (without contacting/hitting the partner) and orient the object appropriately (such as pointing the tip of a knife away, or presenting the handle of a cup of hot coffee, or not spilling any of the contents of the object, such as the coffee in the cup) [85] and (ii) safely release the object when the human partner has grasped it [118] . When handing an unknown object during R2H, it may be challenging for the robot to accurately assess the danger of the object to the human receiver.

One last aspect of safety are social conventions [21] , [72] . Behaviours such as handing over a knife by offering its handle are not only safer per se, but they are regarded as socially more acceptable than thrusting a blade to one's partner, which can convey an erroneous intent (not to mention the inherent risk of harming one's partner) [82] , [201] . Such social conventions offer interesting insights in order to produce safer and more readable behaviours in robots [202] .

There is a general consensus on the need for standardised measurement tools and metrics in the human-robot interaction and collaboration communities [203] , [204] . However, the spectrum of aspects to cover is so broad that finding a set of metrics and tools to adopt in every situation is very difficult. Nevertheless, such common and codified metrics would allow for an easier and fairer comparison among the proposed techniques, and would possibly help to build new frameworks. Metrics should aim to assess a handover qualitatively and quantitatively [203] . Along the same lines, a survey on metrics for human-robot interaction [205] reports productivity, efficiency, reliability, safety and co-activity to be the areas to assess for an interaction. Furthermore, there is a wide range of literature analysing metrics for human-robot interaction and collaboration, such as for human-robot teams [206] - [209] and for social and physical interaction [210] - [212] .

In this section we analyse three different types of metrics: 1) task performance metrics which provide a measure of success, 2) psycho-physiological metrics to measure the human partner's physiological responses, and 3) subjective metrics in the form of user questionnaires. These metrics are represented graphically in Figure 3 . We also analyse the variety of the test objects used in handover experiments.

Task performance metrics are often used in HRI experiments to evaluate success quantitatively, and the choice of such metrics is highly dependent on the task. The performance of a handover can be coarsely described using the success rate: number of successful handovers divided by the total number of trials. Success rate is the most popular task performance metric for human-robot handovers. Even though the overall success rate of an implementation is important, it only reports a statistical view of the handovers rather than the quality of the interaction, and by itself, it does not explain why and how the errors have occurred. Besides, different experimental protocols make it difficult to compare the success rate metrics directly. The interaction force is another measure that has been commonly used to evaluate the success of the interaction.

Considerations of performance also include the task completion time. From this perspective, fluency is an important characteristic of an interaction such as the handover. To evaluate fluency, objective metrics should include percentage of concurrent activity, human idle time, robot idle time and robot functional delay [213] - [216] . These concepts are also related to task effectiveness and interaction effort [217] . Moreover, time considerations can include the reaction time of the human, task completion time and overall handover time. Among the surveyed handover papers, time-related metrics include: waiting time of the robot and the human, total handover time and timing of different phases of the handover. Other task performance metrics used in handovers include defining and minimising a cost function related either to the trajectory or to the interaction.

Another way to gather quantitative data from user studies is to measure the physiological responses of the human partner during the interaction. In HRI, psycho-physiological measures can be used to identify and evaluate the human partner's responses to the interaction with the robot [218] . Physiological signals such as electromyography (EMG) can be used to measure the human's motor activity during the handover. Physiological signals can also be used to estimate the affective state of a human partner during an interaction. Furthermore, physiological responses can be exploited when evaluating responses to a safe planner (less anxiety and surprise, reported feeling more calm) [219] . Another example is Heart Rate Variability (HRV), which can be used as a quantitative index to assess mental fatigue [220] . The psycho-physiological measures to assess anxiety and stress in response to the interaction include, but are not limited to: eye movement; heart rate and heart rate variability; blood pressure; electroencephalography; skin conductance response; pupillary dilation; respiratory rate and amplitude; muscular activity; corrugator muscle activity; electromyography.

Subjective metrics assess aspects such as the subjective perception of the human regarding the perceived difficulty of the task, the cooperation and alliance of the robot, trust in the robot and contribution of the robot [216] . Additional concepts that are recurrent in a qualitative evaluation of an interaction include anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety [221] . The Robotic Fig. 3 . Metrics assess the overall performance of a handover, with measures such as timing and success rate; but also the user experience, with psychophysiological measures and subjective measures.

Social Attribute Scale (RoSAS) framework proposes measuring the subjective and social perception of robots using three dimensions: warmth, competence and discomfort [222] . Legibility, safety and physical comfort are also key criteria to consider [223] . Furthermore, ad-hoc questionnaires and the NASA-TLX 3 can be utilised to provide additional instruments to assess the cognitive workload of humans.

The most common vehicle for user studies in the reviewed papers were post-study surveys, in which the participants rated different aspects of their interaction in a Likert scale. The most commonly asked questions in the questionnaires relate to the fluency of the interaction (i.e., natural, legible, predictable robot motions), how safe and comfortable the participants felt during the interaction, whether participants were satisfied with the experience, the ease of use of the interface, the competence of the robot, the appropriateness of the robot's timing, the perceived aggressiveness of the robot, the trust in the robot, and whether the robot acted in a human-like manner. In addition, for some papers the main subjective evaluation was the indication of preference and/or subjective opinions and comments from the participants.

There have been recent efforts in the grasping community to create physical benchmarks and experimental protocols in order to facilitate the replication of research results [90] , [224] , [225] . Towards the same goal, object datasets have been generated for grasping, such as YCB: an object dataset [226] ; and DexNet: a synthetic dataset of 6.7 million point clouds, grasps, and analytic grasp metrics [227] .

The choice of objects used in human-robot handovers usually depends on the target application; for example it differs for industrial and domestic environments. The last 3 https://humansystems.arc.nasa.gov/groups/TLX/ column of Table I and II shows how many test objects were used for the experiments in the correspondent papers. We found that the vast majority used only a single object class for the experiments. The most commonly used objects were cylindrical objects such as bottles, followed by rectangular objects such as boxes. While some researchers opted for custom-designed objects with sensors mainly for measuring grip and load forces, some have chosen application-specific objects such as flyers [17] .

We discuss human-robot handover papers that present experiments with a real robot, as depicted on Fig. 4 . An overview of these contributions can be found in Table I and II. For each paper we report: paradigm (R2H or H2R); what the authors investigated (communication, grasping, motion planning and control, and perception during the pre-handover phase; grip force and error handling during the physical handover); the sensors used; whether the handover location was fixed, preplanned or adapted online to the human partner; whether the experimental protocol included a post-handover task for the receiver; the metrics used to assess the task performance and the user experience; and finally the number of different objects used in the real robot experiments.

There are a few observations emerging from Table I and II. In general, the paradigm R2H has been investigated more frequently than H2R. The handover location is usually either fixed or pre-planned; on the other hand, online adaptation is much less frequent. The physical handover phase has not been studied as frequently as the pre-handover phase. Furthermore, there is a general lack of uniformity in the experimental protocols, especially in terms of: presence of a post-handover task, metrics to assess the results, and number of test objects.

In order to bridge the gap between human-human handovers and human-robot handovers, we identify two open challenges. First, the interaction should become more fluid and fluent. In most current work, robots present predefined behaviours that force the human partner to comply. This not only decreases the perceived alliance of the robot, but makes the joint action less natural. Second, we believe that experimental protocols should be more standardised, in order to allow a fairer comparison among the proposed algorithms, methods and approaches.

A. Open challenge 1: Adaptability 1) Adaptability and Handover Location: Studies in neuroscience, physiology and psychology highlight that a handover is an intricate joint action that requires physical and cognitive coordination. In particular, the cognitive level of the interaction is as important as the physical level [228] , for a robot to be considered as a partner, and not only as a tool [229] . To match the human skills of understanding and adaptation [230] , it is preferable that robots also display adaptation and understanding. In fact, human givers can control the object's position and orientation to facilitate the robotic receiver's grasping of the object [230] . During extended interaction, fatigue of the human worker having to accommodate the robot repeatedly and for a long time can become an issue [138] . From this perspective, robots that are able to adopt different behaviours adapting to their partner could assist their human counterpart [19] , [153] . Moreover, different users generally interact with a robot in different ways during a handover [233] . In other words, it is crucial to account for the feedback coming from the human partner during the interaction when controlling the robot. However, as can be seen in the Tables, most approaches focus on fixed or pre-planned handover locations, with far fewer attempts at adapting to the human partner online. In many humanrobot handover scenarios, the handover location is kept fixed (the robot is either going always to the same position for the object transfer), or the handover location is pre-planned based on several criteria (including ergonomics, safety, etc.) and not updated in real-time with the perceptual feedback. This is far from ideal, as the human has to potentially adapt to the robot and could incur cognitive and physical fatigue.

The ergonomics of the interaction should be accounted for, as the transfer should happen in the comfort zone of the human, i.e., the range of positions (and tasks to perform in) reachable (and doable) with little or no compensatory movements [236] . In humans, an optimisation principle over a muscle stress index is shown to determine the arm motions and postures (selected over the infinite possibilities of motion) and also the perceived comfort [237] . We believe that while preplanning such a location accounting for the ergonomics and the physical characteristics of the human partner is appropriate, the handover location should not be fixed or pre-planned, but adapted online to the human partner. More effort is needed in order to adjust online to changing circumstances (adapting in real-time to the needs of the human partner).

2) Communication: Communication is a key factor to achieve a successful coordination during a joint action. Humans use speech, gaze, and body movements to communicate intent, and coordinate during the execution of the joint action. We observe that robots have displayed a general lack of communication skills for object handovers. Most of the effort in the literature so far has been put on the physical aspects of the interaction, focusing on motion planning and control, grasping and perception. On the other hand, effort in communication is less prevalent, (this aspect can also be noticed in the Tables, as only a minority of the papers include an element of communication in their implementation). We believe that improving the communication cues provided to the human partner by the robot is a key factor to increase the naturalness and fluency of human-robot handovers.

3) Grip Release: There are also only a few papers that focus on grip release and how to handle potential falls of the object. While the literature in human studies continues to investigate how both agents modulate their grip force on the object and how the different sensory modalities (vision and tactile) come into play, most of the reviewed work has adopted a simplistic approach, i.e., robotic givers completely release the object whenever a pull by the receiver is detected. Conversely, robotic receivers need to modulate their pulling force as too little force could be unsafe for the object transfer and too much force could be dangerous for the human partner. We believe that grip force modulation is a key component that needs further investigation and effort. There are many additional open research directions, such as: (i) the use of different hardware (under-actuated vs fully actuated, soft vs rigid, parallel jaw gripper vs multi-fingered hands vs suction) as in [234] ; (ii) the use of different grasping strategies (grasp type and location on the object); and (iii) the use of objects varying their size and weight. Such exploration could thus give rise to various options to modulate the grip.

In order to enable a fair comparison among the contributions and improve human-robot handovers, a standardised experimental protocol should be developed and adopted, to generate results that are easy to interpret and easy to compare against. We propose to focus on three aspects: the post-handover task, the objects used in the experiments, and the metrics to assess the results.

1) Role of the post-handover task: From the robot's higherlevel behaviour standpoint, there is a critical need for improvement in the integration of cognitive and physical reasoning [10] in both paradigms (H2R and R2H). In other words, robots currently lack a vision and understanding of the general goal of such an action. Such understanding is the key contributor to enabling higher-order planning [33] , [62] , [63] , [74] . For example, robotic grasping has achieved peaks in performance [238] - [240] ; however, the ultimate goal of the grasp is rarely taken into account [92] . As a result, robots can manage to grasp objects but seldom these grasps would allow the execution of a task with the objects. During a handover, a successful grasp should account for the interaction partner. In [235] , a benchmark for H2R handovers is proposed to promote a fairer comparison among algorithms, offering sub-scores for each handover phase sub-action.

Following a similar reasoning, we believe that any experimental protocol should include a task to perform by the receiver with the handed-over object, as proposed in [74] . This is a critical consideration because the object exchange is normally initiated in order for the receiver to perform a task with the object. A complete experimental procedure should consider the capability of the receiver to use the object directly following the handover. If the receiver can grasp the object in a way that its subsequent use does not require any further re-manipulation, then the receiver can start the task straight away after the physical exchange of the object. Conversely, the receiver might need to re-adjust their grasp of the object in case their temporary grasp (realised during the exchange) is not an ideal grasp to correctly use the object for the specific task [93] . However, this post-handover grasp adjustment could decrease the quality of the handover in objective terms (longer task performance time, higher strain when the handover happens multiple times) and in subjective terms (the giver could be perceived as a lesser partner, and the task could be perceived as more cognitively difficult). These quality evaluations are pivotal in establishing the degree of success of a handover. However, very few experimental Fig. 4 . Examples of real implementations of human-robot handovers. The first two images depict the R2H paradigm, while the rightmost image depicts the H2R paradigm. In all the three instances, a real robot is interacting with the human participant. Images are taken respectively from [93] , [48] , [118] protocols include a posterior task for the receiver. Even though it might be argued that a handover can be considered finished after the object transfer, we believe that such post-handover task performance is important to effectively assess the overall performance of the dyad and gauge the experience of the human partner.

2) Proposed Set of Metrics: Our survey has revealed a need for standardisation in the choice of metrics and objects for real robot experiments. Most of the surveyed papers report results using task performance metrics (e.g., success rate and timings) and subjective metrics on the experience of the human partner (often in the form of Likert-scale post-experiment questionnaires). We believe that a minimal set of metrics should be defined in order to enable a fairer and more direct comparison among the different approaches. To this end, we propose the following combination of metrics that assess the most common aspects of a handover: 1) Task performance (objective): success rate, total handover time, receiver's task completion time. 2) Experience of the human (subjective): fluency, trust in robot, working alliance. This minimal set includes metrics which are clearly defined, thus reproducible, and which are easy to measure. For these reasons, the set does not include psycho-physiological measurements as they require sensors placed on the body of the human participant, and thus are difficult to standardise and deploy in a variety of contexts.

The experience of the human participant should be assessed administering the following questionnaire (the following set of questions includes a subset of questions from [216] ): 1) Human-Robot Fluency • The human-robot team worked fluently together.

• The human-robot team's fluency improved over time.

• The robot contributed to the fluency of the interaction.

• I trusted the robot to do the right thing at the right time • The robot was trustworthy.

• The robot accurately perceives what my goals are.

• I understand what the robot's goals are.

• The robot and I are working towards mutually agreed upon goals.

All questions should be evaluated on a Likert scale. We believe that this set of questions covers a broad set of important general aspects of the interaction, namely fluency, trust and working alliance. Furthermore, additional questions can be added to this minimal set in order to investigate additional specific aspects of a handover, such as preference between different approaches, and learning/improvement over time.

The vast majority of papers on human-robot handovers use only a single object class. This observation shows that generalisation of handovers to a variety of objects has not been the main focus of a majority of the papers until very recently [118] , [119] . The most commonly used test objects have been either cylindrical objects such as bottles, or rectangular objects such as boxes. This is likely because these object shapes are easier to grasp and many everyday objects belong to these categories. We argue that future experiments should include a broader set of objects, as different objects generate different behaviours and can be used to address different manipulation tasks. We propose the use of objects that elicit all three grasp macro-types in [75] , i.e., power, intermediate and precision grasps. The three macro-types offer sufficient opportunities to explore different behaviours, investigating aspects such as different object offering and reception; different post-handover tasks; and handover of objects with different weights and shapes. Nevertheless, we acknowledge that the choice of objects might depend on the specific focus of each study. For example, studies on the reaching motion might place their focus on the motion and not on the objects, so three objects evoking the three grasp types would be enough. However, studies more focused on the objects, such as a study on object orientation in the preparation to hand over, would require a wider set of experimental objects.

Our proposition of a minimal set of metrics and of objects to use in an experimental protocol is targeted to increase the possibility of fair comparison among the approaches. A handover is a sophisticated joint action that includes many different aspects (communication, planning, grip release, etc). For this reason, there has been a general non-uniformity in protocols and metrics. We believe that our proposition of metrics and objects covers the most common aspects of a handover, thus enabling a fair comparison among approaches, while allowing for additions when the research questions call for investigation into more specific aspects.

In terms of the paradigm, R2H handovers have been more frequently investigated. We speculate that the idea of having a robot assistant that can fetch objects and give them to humans when needed, has driven the deeper investigation of the R2H paradigm. The R2H paradigm is particularly representative of the cases where the human receiver will then perform a cognitively challenging task with the object, a task that robots are not yet able to perform. However, it is our opinion that H2R handovers are worth exploring more and represent an open area of research. One of the biggest challenges in humanto-robot handovers is safety [118] , as the robot should be careful to not contact the human giver. For this to happen, perception systems should be able to robustly discriminate the human giver (hand and arm) from the object [117] - [119] . Moreover, in the H2R paradigm grasp planning becomes another critical issue, as the robot will have to perform a task with the handed-over object, i.e., at the very least need to put the object down in a pose preferable to humans [241] . Tommaso Pardi Tommaso Pardi qualified with a BSc in Computer engineering (2012) and a MSc in Autonomous and Control engineering (2015) from the University of Pisa. After he graduated, he worked for nine months for a spin-off company of the University of Pisa as an Analyst Programmer. Then, he employed as a research fellow at the BioRobotics Institute of the School of Advanced Studies Sant'Anna in Pisa for one and a half years. In his previous experiences, he worked on soft-robotics, grasping manipulation, robotic teleoperation, and UAV controls. Since 2018, he has started his PhD at the Extreme Robotic Lab at the University of Birmingham to develop advance controllers for robotic cutting and resizing of nuclear wastes. In these years, he worked on grasp selection for post-grasp motions, motion planners for performing full-force tasks, and robot-human handover. 

The role of cobots in industry 4.0

Trends and challenges in robot manipulation

Progress and prospects of the human-robot collaboration

Robotics, smart wearable technologies, and autonomous intelligent systems for healthcare during the COVID-19 pandemic: An analysis of the state of the art and future vision

Combating COVID-19-the role of robotics in managing public health and infectious diseases

Strategic research agenda for robotics in europe

Australian Centre for Robotic Vision

Unesco science report

A roadmap for US robotics from internet to robotics

Towards human-level semantics understanding of human-centered object manipulation tasks for HRI: Reasoning about effect, ability, effort and perspective taking

Computational human-robot interaction

A survey of socially interactive robots

Measurement of trust in human-robot collaboration

Human-robot collaborative manipulation through imitation and reinforcement learning

A projected inverse dynamics approach for multi-arm cartesian impedance control

Human-robot co-manipulation of extended objects: Data-driven models and control from analysis of human-human dyads

A model of distributional handing interaction for a mobile robot

Towards autonomous robotic butlers: Lessons learned with the pr2

Experimental testing of the CogLaboration prototype system for fluent human-robot object handover interactions

Joint action: Bodies and minds moving together

Toward you: The social side of actions

A minimal architecture for joint action

Multiple frames of reference are used during the selection and planning of a sequential joint action

Temporal perception in joint action: This is my action

Prediction in joint action: What, when, and where

Action plans used in action observation

The mirror-neuron system

Evidence for a dyadic motor plan in joint action

Computational principles of movement neuroscience

Response selection during a joint action task

Psychological research on joint action: theory and data

Grip forces when passing an object to a partner

Higher-order action planning for individual and joint object manipulations

Manipulation control with dynamic tactile sensing

The grasping hand

Hand it over or set it down: A user study of object delivery with an assistive mobile manipulator

Human sensorimotor communication: A theory of signaling in online social interactions

Towards seamless human-robot handovers

Grasping language -a short story on embodiment

Neural evidence for the interplay between language, gesture, and action: A review

Using Language

Effects of speech on both complementary and synchronous strategies in joint action

Human-robot dialogue for joint construction tasks

Designing gaze behavior for humanlike robots

Eye-hand coordination in object manipulation

Journal of experimental psychology: Human perception and performance

The utility of gaze in spoken human-robot interaction

Meet me where i'm gazing: how shared attention gaze affects human-robot handover timing

Toward a better understanding of the communication cues involved in a human-robot object transfer

I reach faster when i see you look: Gaze effects in human-human and human-robot face-to-face cooperation

Deliberate delays during robot-to-human handovers improve compliance with gaze communication

Turn taking for human-robot interaction

Did you mean this object?: Detecting ambiguity in pointing gesture targets

Using spatial and temporal contrast for fluent robot-human hand-overs

Automated detection of handovers using kinematic features

Learning the communication of intent prior to physical collaboration

Joint action understanding improves robot-to-human object handover

Fooling the kickers but not the goalkeepers: Behavioral and neurophysiological correlates of fake action detection in soccer

An analysis of deceptive robot motion

Exploration of geometry and forces occurring within human-to-robot handovers

Visualizing robot intent for object handovers with augmented reality

Extending end-state comfort effect: Do we consider the beginning state comfort of another?

Cognition, action, and object manipulation

The prehensile movements of the human hand

Analysis of human grasping behavior: correlating tasks, objects and grasps

Analysis of human grasping behavior: Object characteristics and grasp type

Sensory-motor coordination during grasping and manipulative actions

Control strategies in object manipulation tasks

Where grasps are made reveals how grasps are planned: Generation and recall of motor plans

Effects of end-goal on hand shaping

An object for an action, the same object for other actions: effects on hand shaping

Both your intention and mine are reflected in the kinematics of my reach-to-grasp movement

Choice of contact points during multidigit grasping: effect of predictability of object center of mass location

On the choice of grasp type and location when handing over an object

On grasp choice, grasp models, and the design of hands for manufacturing tasks

Analysis of hand contact areas and interaction capabilities during manipulation and exploration

The grasp taxonomy of human grasp types

Power grip and precision handling

Patterns of static prehension in normal hands

Human prehension and dexterous robot hands

A compact representation of human single-object grasping

Three handover methods in esteem etiquettes using dual arms and hands of home-service robot

Characterization of handover orientations used by humans for efficient robot to human handovers

Comfortable robot to human object hand-over

An affordance sensitive system for robot to human object handover

Part-based robot grasp planning from human demonstration

Implementation of a framework for learning handover grasp configurations through observation during human-robot object handovers

An affordance and distance minimization based method for computing object orientations for robot human handovers

Robotic grasping and contact: a review

Guest editorial open discussion of robot grasping benchmarks, protocols, and metrics

Data-driven grasp synthesis-a survey

Robotic manipulation and the role of the task in the metric of success

The grasp strategy of a robot passer influences performance and quality of the robot-human object handover

Grasp planning for interactive object manipulation

Implicitly assisting humans to choose good grasps in robot to human handovers

The ecological approach to visual perception

An outline of a theory of affordances

Object concepts and action: Extracting affordances from objects parts

Scaffolds for social meaning

Theories and computational models of affordance and mirror systems: An integrative review

Affordances in psychology, neuroscience, and robotics: A survey

What is an affordance? 40 years later

The Psychology of Everyday Things

How objects are grasped: The interplay between affordances and end-goals

Affordances can invite behavior: Reconsidering the relationship between affordances and agency

Interactive perception: Leveraging action in perception and perception in action

Learning object affordances: from sensory-motor coordination to imitation

Detecting object affordances with convolutional neural networks

Task-oriented grasping with semantic and geometric scene understanding

Affordance detection for task-specific grasping using deep learning

Towards affordance prediction with vision via task oriented grasp quality metrics

Learning task-oriented grasping from human activity datasets

Affordance-aware handovers with human arm mobility constraints

Learning joint reconstruction of hands and manipulated objects

Freihand: A dataset for markerless capture of hand pose and shape from single rgb images

Ho-3d: A multi-user, multi-object dataset for joint 3d hand-object pose estimation

Human grasp classification for reactive human-to-robot handovers

Object-independent human-to-robot handovers using real time robotic vision

Reactive human-to-robot handovers of arbitrary objects

Movement imitation with nonlinear dynamical systems in humanoid robots

Learning movement primitives

Human motion prediction in human-robot handovers based on dynamic movement primitives

Human-robot cooperative manipulation with motion estimation

Predicting object transfer position and timing in human-robot handover tasks

Prediction of human arm target for robot reaching movements

Object handover prediction using gaussian processes clustered with trajectory classification

Unsupervised early prediction of human reaching for human-robot collaboration in shared workspaces

Human-human handover tasks and how distance and object mass matter

Object transfer point estimation for fluent human-robot handovers

Workspace analysis for planning human-robot interaction tasks

A position generation algorithm utilizing a biomechanical model for robot-human object handover

Postural optimization for an ergonomic human-robot interaction

Work-related musculoskeletal disorders: the epidemiologic evidence and the debate

Sharing effort in planning human-robot handover tasks

Planning handovers involving humans and robots in constrained environment

Modeling human reaching phase in human-human object handover with application in robot-human handover

Towards ergonomic control of human-robot co-manipulation and handover

A selective muscle fatigue management approach to ergonomic human-robot co-manipulation

Investigating human-human approach and hand-over

Synthesizing object receiving motions of humanoid robots with human motion database

A human-inspired controller for fluid human-robot handovers

Legibility and predictability of robot motion

Human preferences for robot-human hand-over configurations

Fabrik: A fast, iterative solver for the inverse kinematics problem

Singularity-robust inverse kinematics solver for tele-manipulation

Towards understanding user preferences in robot-human handovers: How do we decide?

Learning user preferences for robot-human handovers

Design principles for safety in human-robot interaction

Handover planning for every occasion

Dynamic grasp and trajectory planning for moving objects

Relative importance of spatial and temporal precision for user satisfaction in human-robot object handover interactions

Perception and control challenges for effective human-robot handoffs

Adaptive coordination strategies for human-robot handovers

Anticipating human actions for collaboration in the presence of task and sensor uncertainty

Assessment of operators' mental strain induced by hand-over motion of industrial robot manipulator

Combining cartesian trajectories with joint constraints for human-like robot-human handover

A hybrid system framework for unified impedance and admittance control

Hybrid motion/force control: a review

A variable admittance control strategy for stable physical human-robot interaction

An overview of null space projections for redundant, torque-controlled robots

Redundancy resolution in human-robot co-manipulation with cartesian impedance control

Control of generalized contact motion and force in physical human-robot interaction

Variable admittance control preventing undesired oscillating behaviors in physical human-robot interaction

Human interaction with a service robot: mobile-manipulator handing over an object to a human

Specifying and synthesizing human-robot handovers

Implementation and experimental validation of dynamic movement primitives for object handover*

Interaction primitives for human-robot cooperation tasks

Adaptation and robust learning of probabilistic movement primitives

Probabilistic movement primitives for coordination of multiple human-robot collaborative tasks

Learning dynamic robot-to-human object handover from human feedback

A human inspired handover policy using gaussian mixture models and haptic cues

Grip forces and load forces in handovers: Implications for designing human-robot handover controllers

Humans adjust their grip force when passing an object according to the observed speed of the partner's reaching out movement

A human-inspired object handover controller

Implementation of a robot-human object handover controller on a compliant underactuated hand using joint position error measurements for grip force and load force estimations

Autonomous object handover using wrist tactile information

The effects of proactive release behaviors during human-robot handovers

Reliable object handover through tactile force sensing and effort control in the shadow robot hand

Identifying multiple interaction events from tactile data during robot-human object transfer

Failure recovery in robot-human object handover

A fail-safe object handover controller

Robot-to-human object handover using a behavioural control strategy

Human-to-robot object handover using a behavioural position-based force control approach

Requirements for safe robots: Measurements, analysis and new insights

Safe and dependable physical human-robot interaction in anthropic domains: State of the art and challenges

An atlas of physical human-robot interaction

Safe planning for human-robot interaction

Safety in human-robot collaborative manufacturing environments: Metrics and control

Safety barrier functions for human-robot interaction with industrial manipulators

On-line motion prediction and adaptive control in human-robot handover tasks

Synthesizing robot motions adapted to human presence

Human-aware robot navigation: A survey

Exploratory study of a robot approaching a person in the context of handing over an object

Robotic etiquette: results from user studies involving a fetch and carry task

Geometric tools for perspective taking for human-robot interaction

Spatial reasoning for human robot interaction

Soft motion trajectory planner for service manipulator robot

The coordination of arm movements: an experimentally confirmed mathematical model

Human-robot interaction in handing-over tasks

Evaluation of a novel biologically inspired trajectory generator in human-robot interaction

Study on soft-tissue injury in robotics

A human-aware manipulation planner

Common metrics for human-robot interaction

Metrics and benchmarks in human-robot interaction: Recent advances in cognitive robotics

Survey of metrics for human-robot interaction

Steps to creating metrics for humanlike movements and communication skills (of robots)

Identifying generalizable metric classes to evaluate human-robot teams

Toward developing hri metrics for teams: Pilot testing in the field

Framing and evaluating human-robot interactions

Measurement instruments for the anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety of robots

Social resonance: a theoretical framework and benchmarks to evaluate the social competence of humanoid robots

A visual method for robot proxemics measurements

Effects of anticipatory action on human-robot teamwork: Efficiency, fluency, and perception of team

Cost-Based anticipatory action selection for human-robot fluency

Comparative performance of human and mobile robotic assistants in collaborative fetch-and-deliver tasks

Evaluating Fluency in Human-Robot Collaboration

Metrics for evaluating human-robot interactions

Survey of psychophysiology measurements applied to human-robot interaction

Physiological and subjective responses to articulated robot motion

A framework for affect-based natural human-robot interaction

Measurement instruments for the anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety of robots

The robotic social attributes scale (rosas): Development and validation

Physiological and subjective evaluation of a human-robot object hand-over task

Benchmarking protocol for grasp planning algorithms

Benchmarking hand and grasp resilience to dynamic loads

Yale-CMU-Berkeley dataset for robotic manipulation research

Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics

Human-robot interaction: Tackling the ai challenges

Working collaboratively with humanoid robots

Human-robot interaction for cooperative manipulation: Handing objects to one another

Evaluation of a novel biologically inspired trajectory generator in human-robot interaction

Determining proper grasp configurations for handovers through observation of object movement patterns and inter-object interactions during usage

Hand in hand with robots: differences between experienced and naive users in human-robot handover scenarios

Touch-based grasp primitives for soft hands: Applications to human-to-robot handover tasks and beyond

Benchmark for human-to-robot handovers of unseen containers with unknown filling

The postural comfort zone for reaching gestures

Optimization principle determines human arm postures and comfort

Learning robust, real-time, reactive robotic grasping

Model-free and learning-free grasping by local contact moment matching

Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection

Learning to place objects onto flat surfaces in upright orientations

The authors want to thank the Australian Research Council (Centre of Excellence for Robotic Vision (project number CE140100016)). Elizabeth Croft acknowledges support from Australian Research Council (project number DP200102858) and the Natural Sciences and Engineering Research Council of Canada (project number RGPIN-2017-04450). Tommaso Pardi is supported by a doctoral bursary of the UK Nuclear Decommissioning Authority. The authors want to thank: Peter Corke for his invaluable feedback; Alessandro De Luca for the good advice about safety in physical human-robot interaction; Marco Controzzi for the discussion on the role of the task in experimental protocols; Maya Cakmak for recommendations to further improve this manuscript.