Programming Robosoccer agents by modeling human behavior Ricardo Aler3'1, José M. V a l l s a ' \ David Camachob'*, Alberto López a ' ] a Computer Science Department, Universidad Carlos III de Madrid, Ávenue Universidad, No, 30, 28911, Legones, Madrid, Spain b Computer Science Department, Universidad Autónoma de Madrid, CIFrancisco Tomás y Valiente, No. 11, 28049, Madrid, Spain Abstract The Robosoccer simulator is a challenging environment for artificial intelligence, where a human has to program a team of agents and introduce it into a soccer virtual environment. Most usually, Robosoccer agents are programmed by hand. In some cases, agents make use of Machine learning (ML) to adapt and predict the behavior of the opposite team, but the bulk of the agent has been preprogrammed. The main aím of this paper is to transform Robosoccer into an interactíve game and let a human control a Robosoccer agent. Then ML techniques can be used to model his/her behavior from training instances generated during the play. This model will be used later to control a Robosoccer agent, thus imitating the human behavior. We have focused our research on low-level behavior, like looking for the ball, conducting the ball towards the goal, or scoring in the presence of opponent players. Results have shown that indeed, Robosoccer agents can be controlled by programs that model human play. 1. Introduction The Robosoccer simulator is a challenging environ- ment for artificial intelligence, where a human has to pro- gram an agent and introduce it into a soccer virtual environment. Programming complex behaviors in soft- ware agents is usually a time-consuming and difficult task for human programmers. Machine learning (ML) is becoming a promising way of automatically endowing agents with complex skills. The main approach that has been followed so far is to let the agents learn the behav- iors by themselves, totally or partially. For instance, in the case of the Robosoccer simulator, Luke, Hohn, Far- ris, and Hendler (1997) uses genetic programming to evolve a complete team of agents whereas Stone and Veloso (1998) proposes an agent architecture where learn- ing can be used at many levéis, from acquiring skills to adapting to the opponent. But ML can be used in another interesting way. Human players can learn to play many games well and quickly, so it makes sense to attempt to imítate them and transfer their experience to computer agents. Learning by imitation and modeling humans is a field common to many research áreas like robotics (Kuniyoshi, Inaba, & Inoue, 1994), cog- nitive science (Brown & Burton, 1978), or user modeling (Webb, Pazzani, & Billsus, 2001). However, only recently imitation techniques have been applied to programming computer agents, specially in games. Sklar, Blair, Funes, and Pollack (1999, 2001) is the first reported work (to our knowledge) where data collected from players is used to train an agent that plays T R O N . More recent research has produced quite remarkable human-like behavior in the Quake game (Thurau, Bauckhage, & Sagerer, 2003, 2004a, 2004c). Keywords: Learning to play; Imitation; Human modeling; Behavioral cloning; Machine learning; Robosoccer 1 Cita bibliográfica Published in: Expert Systems with Applications, Marzo 2009, vol. 36, n. 2, p. 1850-1859 The work reported in this paper follows this line of research and attempts to imitate a human player with the purpose of creating a Robosoccer player that performs well in the field. The Robosoccer type of domains is interesting for modeling humans, as skills range from low-level reac- tive behavior to high level strategic actions and team play. Our approach works as follows. First, we created an inter- face to allow a person to play Robosoccer just like any other video-game. Then, many input/output pairs were generated and recorded after the human player played sev- eral matches. The input is what the person can see in the field and the output is the action the person performed under that situation. Then, a ML technique is used to learn the mapping from inputs to outputs. Finally, the ML model is used to control a computer agent. Imitation can be performed at different levéis: reactive, tactical, or strategic. Different levéis will raise diflerent issues. As this is the first attempt at using human experi- ence to control an agent in the Robosoccer domain, we have chosen to imitate the human player in low-level actions like running, turning, or kicking the ball. However, results show that learning when to perform such low-level actions allow to learn slightly more complex sequences of actions like conducting the ball to the goal, dribbling oppo- nents, or stealing the ball from opponents. Results have shown that indeed, agents can be pro- grammed by modeling the experience of humans playing Robosoccer and performing behaviors like looking for the ball, conducting the ball towards the goal, or scoring in the presence of opponent players. This paper has been divided into the following sections. First, Section 2 deals with work related to modeling other agents, human modeling, imitation, and modeling in games. Section 3 describes our modeling approach in the Robosoccer domain. Section 4 describes how the agent was trained from the instances generated by the human in different types of behavior and discusses the results. Finally, 5 draws the most important conclusions of this work and posits some future lines of research. 2. Related work Currently, there is a lot of interest in automatic model- ing of other agents and also of human users/players. In this Section, we will overview related work about agent model- ing, considering whether models were reactive or had some form of internal memory, the power of the modeling lan- guage (propositional, first order, etc.), the task involved (classification, prediction, imitation, . . . ) , or the domain type (continous, noisy, . . . ) . We will also focus on aspects related to human modeling. One of the first attempts at modeling opponents was that of Carmel and Markovitch (1996). Here, the goal was to learn finite automatons (DFA) consistent with the behavior of the opponent. The model was used to improve a mini-max search algorithm. Interestingly, models included an internal state. This approach was valid only for discrete, round-based games, and required complete non-noisy information. Prediction was the goal, but such models could be used to imitate the opponent. Machine learning has been used extensively since then in classical and strategic games. Frnkranz and Kubat (2001) contains a good survey with some discussion of opponent modeling. Behavioral cloning is an attempt to imitate other agents behavior (Urbancic & Bratko, 1994; Bain & Sammut, 1999). Sammut, Hurst, Kedzierand, and Michie (1992) uses this technique to learn from a human piloting a sim- ulator, from whom input/output traces are obtained. It learns a decisión tree for each of the four plañe controls. It is a non-deterministic, quickly changing, noisy domain. The complete fly is divided into seven stages, each one with a different goal. Otherwise, similar situations (inputs) in different stages would have very different outputs. Goals are taken into consideration, and it distinguishes between two kinds of goals: homeostatic (non-persistent) and standard (persistent). The authors identify a problem when learning from different persons, because they use very different piloting styles. It differs with our work because the testing domains is different and no informa- tion about the stage or the current goal is supplied to the learning system. K N O M I C (Lent & Laird, 1999) is an approach that learns to imitate an expert from input/output traces in dynamic, non-deterministic, and noisy domains. It uses a rich knowledge representation based on SOAR rules. F o r instance, actions (operators) are decomposed into three kinds of rules: selection, application, and goal-detection. As in behavioural cloning, two kinds of goals are consid- erad: non-persistent and persistent. The expert is required to decompose a task into operators and sub-operators (a hierarchy of them, actually). This eliminates ambiguity between traces belonging to different parts of the task, which usually aim to achieve different goals (some research that tries to work around this problem, by inducing the agent's subgoals and using them to build the model, is reported in Suc & Bratko (1997)). The expert is also required to annotate the traces by telling the system which operators are selected and unselected. The authors claim that K N O M I C can learn from observing a human in diffi- cult domains such as air combat and Quake II. Successful results are only given from observing a hand-programmed agent. The authors identify a source of ambiguity for learn- ing from humans: they are less systematic, more variable, and make errors. There are many similarities with our approach, but no annotations are obtained from the user in our case. Bakker and Kuniyoshi (1996) present an overview of imi- tation in the field of robotics. They define imitation as "imi- tation takes place when an agent learns a behaviour from observing the execution of that behavior by a teacher" and summarize it as solving the problems of "seeing, understand- ing, and doing". However, most work is centered around the seeing and replicating sequences of actions (which in robot- ics is not a trivial task), and less about the understanding 2 part. Based on Piaget's work (1945), they bring up the important issue that "it is not possible to learn a new behav- ior unless one almost knows it already". Kuniyoshi et al. (1994) fits into this framework: a robot observes a human performing a simple assembly task and then reproduces it. Here, the problem that was addressed was to recognize the human actions (in terms of the ones that it already knew). The robot could also adapt to small changes in the position of Ítems on the table. In Hayes and Demiris (1994), a robot follows a teacher through a maze and learns to associate the environment, in terms of local wall positions, with the teacher's actions. In short, "if situ- ation then action" rules are learned. This second study is similar to our input/output rule learning, but the domain used is simpler. An área were human behavior ML modeling has been the focus is that of user modeling (Webb et a l , 2001). Early work in this field was about student modeling, which seeks to model the internal cognition of a student's cognitive system (as in Brown & Burton (1978)). This can be used to make online learning adaptive to the student skills and background knowledge, as well as to predict the student's actions. However, Self (1988) casted some doubt on the tractability of the cognitive approach. Since then, many researchers have followed a different approach that models an agent in terms of the relationships between its inputs and outputs. This approach, called input-output agent modeling (IOAM), treats the operation of the cog- nitive system as a black box. Some of the systems include feature based modeling (Webb & Kuzmycz, 1996), rela- tional based modeling Kuzmycz (1994), C4.5-IOAM Webb, Chiu, and Kuzmycz (1997), and F F O I L - I O A M Chiu, webb, and Kuzmycz (1997), among others. Both propositional and relational learning systems have been used. An usual testing ground is the problem of substrac- tion, where models are learned to predict the student response, including mistakes, when doing subtractions. In contrast with the Robosoccer, this is a discrete and sta- tic domain. Currently, the demands of electronic commerce and the Web have led to a fast growth in research in information retrieval, where ML can be used for acquiring models of individual users interacting with information systems (Bloedorn, Mani, & MacMillan, 1996) or grouping them into communities with common interests (Paliouras, Karkaletsis, & Papatheodorou, 1999). These models can help users in selecting useful information from the Web. Although user models are obtained, their domain and pur- pose is very different to ours. Their aim is to learn user pref- erences to filter Web information or to build adaptive interfaces. Also, users can be classified into stereotypes, so that user likes or dislikes can be predicted. ML techniques have also been applied to the prediction of user's actions, also called plan recognition. Kautz and Alien (1986) defined it as the problem of identifying a minimal set of top-level actions that are sufficient to explain the observed actions. Most work learns hierarchical plans from user logs, and associate them with the goal the user was trying to achieve (Bauer, 1999,1996). Later, plan libraries can be used to match actual user actions with a plan in the library and determine the user intentions. But their purpose is not to imi- tate the user, as in our research. Some research deals with agents modeling opponents and using this knowledge to beat them. In some cases, models are not learned, but predefined and used to classify opponent teams (not individual agents) by means of a sim- ilarity metric (Riley & Veloso, 2000a). Models can be used, for instance, to select the best plan to beat the opponent in set plays (Riley & Veloso, 2000a, 2001b). However, although Riley's RectGrid models can classify adversaries, they cannot be used to imítate opponent behavior. Riley's work has focused mostly on the Robosoccer domain, as in our case. Riley, Veloso, and Kaminka (2002) uses ML for a coach agent in the Robosoccer domain. That is, here learning takes place at a more strategic higher level. The coach can learn from previous games three kinds of models about teams: team formations and passing behaviour. These models can be used to predict and beat the opposite team, or more closely related to our research, so that the team imitates the modeled team. Differences with our research is that whole teams are modeled, and the coach agent is used to observe the playing field. In Stone (2000), a layered learning architecture is pro- posed. It is designed to use Machine learning in complex domains, where learning a direct mapping from sensors to actuators is not tractable, and a hierarchical task decom- position is given. Different learning mechanisms are used to learn behaviors from the bottom level (simple behaviors) to the highest level (more complex, strategic behaviors). The architecture is instantiated in the Robosoccer to learn an individual skill (ball interceptation), a multi-agent behavior (pass evaluation and selection), and adapting the team behavior (here, a new reinforcement learning algorithm called TPOT-RL is used Stone & Veloso (1999)). Although this work does not focus particularly on modeling other agents, it is relevant because if their working hypothesis is correct, it should be expected that learning the detailed models required to imítate other agents should require a layered architecture too (i.e. a direct mapping from inputs to outputs will be almost impossible to learn). This is very likely to be true in general, and should be taken into account in future research, but our work shows that learn- ing input-output mappings yield some positive results. In the opponent-modeling context, Bowling (2003) uses rein- forcement learning for multi-agent systems where other agents are also learning. There, the CMUDragons Robo- cup team is described, that is able to adapt online to an unknown opponent team. Finally, Ledezma, Aler, Sanchis, and Borrajo (2004) proposed a ML scheme to take advan- tage of prediction of opponents based on visual observa- tions in the Robosoccer simulator. In all these cases, the goal is learning to play and predict/adapt to opponents, but not imitation. 3 R. Aler et al. I Expert Systems with Sklar et al. (1999, 2001) is the first reported work (to our knowledge) where data collected from players playing a dynamic video-game, is used to train an agent. In this case, data was collected from humans playing Tron over the internet and a neural network was trained. Although they managed to créate effective controllers for this game, it is less clear that the resulting behavior imitated the humans. They bring up the issue that a person, under the same sit- uation can produce different responses, which can confuse the learning process. Spronck, Sprinkhuizen-Kuyper, and Postma (2004) uses a new technique called dynamic scripting in role playing games (RPG), to assign weights to rules in a reinforcement learning way. In Spronck, Sprinkhuizen-Kuyper, and Post- ma (2002) neural networks and genetic algorithms are used to online learning in R P G . In Ponsen and Spronck (2004) the same techniques are applied to real time strategy games. Recent research has shown a lot of interest in first per- son shooter games (FPS), like Quake. In this kind of games, human players must react to situations very quickly. So, it is a very suitable environment for learning reactive behaviors. They are played in networks by hun- dreds of people and records of the best games are kept, so there are good opportunities for Machine learning. Thu- rau, Bauckhage, and Sagerer (2004a) provide a good sum- mary of how imitation can be used at all levéis (reactive, tactical, strategic, and motion modeling ) in FPS games. In Thurau et al. (2003), the authors report some initial research in learning running and aiming behaviors from recorded human games. They built a MATLAB interface to allow a human to play Quake II and record pairs of state-vectors and actions. Then, they used a self-organizing map to reduce the dimensionality of state-vectors and multi-layer neural networks to map state-vectors to actions. Initial results in learning this kind of reactive behavior seem positive. In Bauckhage, Thurau, and Sager- er (2003), neural networks are used to learn trajectories, aiming behavior and their combination. They bring up the important issue of taking into account the context and the past, in addition to the state-vector, to reduce ambiguity when deciding which action to perform. Also in the Quake game, recent research tries to improve imita- tion models by means of genetic algorithms (Priesterjahn, Kramer, Weimer, & Goebels, 2005). In Bauckhage and Thurau (2004), tactical knowledge about which weapon to use is learned from human play by means of a mixture of experts. In Thurau, Bauckhage, and Sagerer (2004b), Neural Gas algorithms are used for learning the topology of the environment, and potential fields for learning human trajectories for picking up objects situated at different locations. This is strategic knowledge, because it tells which is the most important place to pick up the next object. The bot got stuck at some locations and temporal changes to the potential field had to be added (pheromone's trails). The author's claim that the bot imi- tated human behavior in simple setups and showed a mix- ture of intelligent behavior for more complex missions (i.e. Applications 36 (2009) 1850-1859 1853 less imitation but still clever movements). In Thurau, Bauckhage, and Sagerer (2004c), principal component analysis is used to extract primitive movements (building blocks for more complex sequences of movements) and conditional probabilities on the state-vector and the last action are learned. The artificial movements learned seem realistic and even certain human habits are preserved. 3. Our modeling approach 3.1. Modelling process As Fig. 1 shows, a human player interacts with an inter- face (the G U I soccerclient) that allows to play Robosoccer as a video-game. This G U I sends the human commands to the Soccerserver and displays the state of the field to the human. The interface has been carefully designed so that the only information that is displayed to the user is the one available to the actual agent in the simulated field. The trainer is used to set the playing field in a particular state, because many different states are required to learn general models. From the G U I , a trace is obtained by observing the human play. Records are obtained for every server cycle. This trace is made of many (s, a) such records, where s is the observation made by the agent sensors (distance to the ball, angle to the ball, etc). And a is the action carried out by the human player in that situation (for instance, kicking the ball, turning, etc.). Then, Machine learning techniques can be used to obtain a classifier that determines which action has to be carried out in a particular situation. Then, the classifier will be translated into C code, which will be used to control a soccer agent. If the modeling process is correct, the soccer agent will play Robosoccer similarly to the human. 3.2. The Robosoccer interface In order to build a good model for the soccer agent, the information available to the human through the interface must be as cióse as possible to that available to the agent. The XClient programmed by Itsuki N o d a 2 accomplishes this restriction. It is a first person interface, so the human player observes the objects in the field in 3D perspective. However, the versión of the Soccerserver we have used is 2D and it is a bit confusing to observe a 2D world in a first person view. Therefore, we decided to program our own interface, that is displayed in Fig. 2. This interface displays a complete 2D real time view of the field, just like the soccer monitor. Absolute positions are computed by means of the (Matellan, Borrajo, & Fer- nandez, 1998) library (trigonometry computations are used to obtain absolute positions of objects from the known position of the banners distributed along the field border). 4 Fig. 1. Process to obtain a model from a person playing Robosoccer by using Machine Learning. Fig. 2. 2D Interface. Objects are represented by probability circles. Although the whole field is visible, only those objects probability circle. Different colors3 are used to differentiate within the visión cone of the agent are displayed. Also, in the ball, the opponents, and the same team players. Those Robosoccer, perception of far away objects is noisy. Thus, objects are displayed as probability circles: the radius of the circle depends on the radius of the object and the distance 3 F o r ¡ n t e r p r e t a t i o n of c o l o r in F i g 2, the reader is referred to the web to the object. In Fig. 2, the ball is represented as a versión of this article. 5 objects which are no longer visible are represented at the last position they were seen. In order to improve playability, not all Soccerserver commands are available to the player. F o r instance, it is possible to kick the ball with any strength, but the interface only allows a standard kick. The commands allowed by the interface are: • turn left: the player can turn the agent's body a 10° to the left. The player is only allowed to turn left (but not right) because in some preliminary experiments we found out that it was very difficult for the Machine learning algorithm to discrimínate between turning left and right. • run slow/fast: only these two kinds of dash are allowed. Their power is 60 and 99, respectively. These valúes were obtained experimentally. • kick ball: kicks the ball in the direction of the agent's view line with a strength of 60. • kick to goal: kicks the ball towards the goal, with a strength of 99. 3.3. The trainer (coach) agent In order to have a diverse set of instances for learning behaviors, a diverse set of situations has to be presented to the human player. The trainer agent was used for this pur- pose. The trainer agent allows to position the agent, the ball, and same-team/opposite team agents in arbitrary positions in the field. Our trainer agent puts objects in random posi- tions in the field according to Fig. 3. In this way, an initially defensive positioning of the opposite team is achieved. 3.4. Opposite agents In the most complex situations, the human player will play within a team against a complete opposite team. In this paper we want to determine whether human input/out- put modeling of persons works in principie in the Robosoc- cer domain. To achieve this, we have used a simpler situation where a single human-controlled agent plays against a defensive opposite team (Camacho, Fernández, & Rodelgo, 2006; Fernandez, Gutiérrez, & Molina, 2000). This team is based on zones, where each of the team members is located. Among the agents, the player closer to the ball takes the role of leader. The rest of agents maintain a distance from the leader so as to maintain the formation. Agents determine who is the leader and pass this informa- tion to others. When the agent moves away from its zone, it tries to pass the ball to other agent, if available. Otherwise, it continúes towards the goal, in order to score. The goalie follows a similar behavior: it stays within its zone and looks for the ball. If it is cióse (15 units), goes for it and kicks it towards the opposite goal. 3.5. Model representation Human behavior can be modeled in many ways. Some of them have been used traditionally in AI: rules, trees, regression models, logic programs, etc. In this paper we intend to study how far first order representations can get. F o r this paper, we have chosen rules as a way of rep- resenting human behavior. They have a long tradition for representing knowledge and their conversión to C i f - t h e n - e l s e structures is straightforward. Also, we have used the C4.5 algorithms for generating the rules, although any rule-based ML algorithm would work just as well. Most specifically, the P A R T algorithm (Revisión 8 of C4.5) included in the Weka ML tool has been used (Ian, 2000). The rules follow a i f ( s i t u a t i o n ) t h e n a c t i o n structure, where the s i t u a t i o n checks the valúes of the agent's sensors and the a c t i o n tells what the agent should do next. Table 1 displays two actual rules obtained in the course of our research. With respect to the i f part of the rules, it is very impor- tant to select informative attributes so that they capture all information used by the human player to make decisions. Table 2 displays the attributes we have chosen. All positions Fig 3 Rectangles where opposite agents will be randomly positioned by the trainer 6 Table 2 Attributes used in the left hand side of rules Attribute X, Y Angle Anglejall Distance Ball Distance_Opposite 1 AngIe„Oppositel Valid Oppositel Distance_Opposite2 Angle_Opposite2 Valid_Opposite2 Angle_Opponent_goal Distance_Opponent_Goal Valid OpponenLGoal Meaning Absolute agent location Angle of the agent's view Une Angle between the ball and the agent's view line Distance from the ball to the agent Distance from the closest opposite player to the agent Angle between the agent and the closest opponent It indicates whether the closest opponent could be seen in the last server cycle Distance from the second closest opposite player to the agent Angle between the agent and the second closest opponent It indicates whether the second closest opponent could be seen in the last server cycle Angle between the opponent's goal and the agent Distance to the opponent's goal It indicates whether the opponent's goal could be seen during the last server cycle Type Real Real Real Real Real Real Boolean (0,1) Real Real Boolean (0,1) Real Real Boolean (0,1) of objects (ball, opponents, and goal) are relative to the view line of the agent and expressed in polar coordinates: dis- tance and angle. If the object is too far away, it cannot be seen and the valúes of these attributes are meaningless. This is indicated by other attributes, prefixed by v a l i d which tell whether the associated object was visible or not. Abso- lute attributes are only used for the X and Y coordinates of the agent. In this paper, only the two closest opponents are considered by the rules, although it would be easy to créate new attributes so that more opponents can be taken into account. The valúes that the right hand side of rules (the action part) can take are: kickóO, kick99, dashóO, dash99, tura 10, and turnminuslO. They have already been explained. Cur- rently, the interface can only use these discretized actions, but in the future, it could be modified so that the human player can select a continuous valué to turn, to kick, or to dash. 4. Training the agent Some preliminary experiments were carried out for test- ing simple behaviors like looking for the ball and advanc- ing with the ball in an empty field. As these behaviors were easily learned and properly performed by the agent, we proceeded to more complex behaviors, which involve playing against opponents. 4.1. Dribbling static opponents This skill involves a striker advancing with the ball and scoring, after dribbling static opponent agents located near the goal. These opponents can only kick the ball when it comes cióse to them. Eleven opponent players are used. The training cases will be generated in such a way that the striker is forced to dribble opponents to find the ball, and then continué dribbling them until it scores. A first attempt was done with 5370 training instances, obtaining a 95.88% 10-fold cross-validation accuracy. The agent is able to find the ball when there are no oppo- nents and conduct it to the goal. The agent also does well when confronting the 11 opponents. However, in some cases it displays the following flawed behaviors: • The agent tries to kick the ball when it is too far away, or when the angle is not appropriate. • When the agent was cióse to the ball, it turns left again and again. • The agent collides with an opponent and stops there. Once the agent performs these flawed behaviors, it never gets out of these states. So for instance, the agent will try to kick the ball forever, or it will get into an eter- na] turning loop. In general, and this is a property of our reactive approach, if for some reason, the agent performs an action that does not change its environment, or that it changes it but this change is not perceived by the left hand side of the rules, the agent will get stuck in that behavior forever. F o r instance, when the agent tries to kick the ball, but it is not cióse enough, nothing has chan- ged in the world, so the agent will repeat its kicking behavior again and again. Similarly, if a chain of rule actions gets the agent to do a complete 360° loop, this will be done forever. Perhaps a mechanism should be added on top of the rules, that realizes when the agent has got into such states and do some random actions until it gets out. However, in this paper we only want to study the puré learning approach, so we will leave that for the future. In order to improve this behavior, the number of train- ing instances was increased to 14,915, obtaining 172 rules, and a 95.11% accuracy. The behavior improved, but the agent still performs flawed behaviors. These flaws restrain the agent from fulfilling its objective. Our final agent dis- played flawed behaviors in 12% of the triáis (a trial involves letting loóse the agent in the field, finding the ball, and 7 scoring). We found very hard to improve these results by adding more training instances. In the conclusions section, we will discuss why this is so and propose new lines of research to overeóme these limitations. Thus, results are not perfect but we considered it to be acceptable. Also, these behaviors happen in a world where the only agent that can initiate actions (and change the world) is the striker. When there are more active opponents in the field, the world will change independently of the agent, and the agent will get out of its static states more easily. 4.2. Match with opponents This is the most complex behavior learned: a striker must get to the ball and score against three defences and one goalie. For this task, it would seem that it would be desirable to choose very diíBcult opponents, like CMUnit- ed or FC-Portugal, which were previous Robocup champi- ons. However, the human player found impossible to beat them. This was due to these teams playíng extremely well, but also to the interface not being responsive enough. This latter problem could not be solved, because the Robosoccer server was not designed with interactive play in mind. As our aim is to show that human experience can be transferred to soccer agents, we have chosen a challenging but beatable Robosoccer team (Camacho et al., 2006; Fer- nandez et al., 2000), which has the advantage that although their players have a team behavior, we can use as many players as desired. In this case, only four of them were used. In any case, it must be remarked that, although the (Camacho et al., 2006; Fernandez et a l , 2000) team is not a Robocup champion, it is still a very challenging situ- ation, because: • The opponents outnumber our agent and play cooperatively. • The opponent can use more actions (turning the neck, turning left and right, kicking and dashing with any power and angle, . . . ) . • The opposite team has a goalie, whereas our agent must defend and attack. By confronting a human player against this team, we were able to learn rules that could be transferred to an agent that performed very well, as it will be shown next. 16,124 instances were obtained and 234 rules were cre- ated, with a 93.44% cross-validation aecuracy. Then, the agent had to play in six new testing matches. Although the agent did not win any of the six testing matches, it scored some goals. Results were: 2 - 5 , 1-6, 0-5, 1^1, 1-5, 0-4 (where the first number indicates goals scored by the agent, and the second one, goals scored by the opponents). The agent incurs in previous flawed behaviors like trying to repeatedly kick the ball when it is not there. However, in this case the world is more dynamic and when an opponent or the ball comes cióse, the agent gets out of the loop, and reaets. In order to improve these results, we increased the num- ber of instances to 24594. 332 rules were generated (93.65%). In this case, we also pruned the rules using W E K A ' s standard parameters. The number of rules was reduced to 164 (92.65%). Yet, the behavior was greatly enhanced: the agent was able to find the ball on the field, to conduct it towards the goal, to score, to dribble oppo- nents, and to steal the ball from them. The scores in six matches display this improvement, as the agent won one of the games: 5-4, 2-4, 3^1, 4 - 5 , 2^1, 3^1. The agent scored 19 goals versus 25 goals scored by the opponents. The learned classifier was further pruned to 69 rules (91.90%). Similar results were observed in six new matches: 3—4, 2- 4, 3-3, 2-5, 3 - 1 , 4 - 3 . The agent scored 17 goals versus 20 goals scored by the opponents. Table 3 summarizes these results. 5. Conclusions and future work In this paper we have applied an input-output modeling approach to model a human playing Robosoccer. First, an interface was built that displayed to the user the objeets in the playing field that could be seen according to Robosoc- cer rules. This interface allowed the user to send low-level commands (dash, turn, and kick) to the Soccerserver. Input/output instances generated by the human player were used by a Machine learning algorithm (PART) to learn a model. This model was then introduced into a com- puter agent. Results show that in different low-level behav- iors, like looking for the ball, conducting the ball to the goal, dribbling opponents, and scoring in the presence of other players, our approach works well. The final agent was able to score many goals against a computer team that the human found challenging. As far as we know, this is the first time that behavioral cloning techniques have been applied in the Robosoccer domain, with positive results. This shows that this is a very promising line of research whose results could be improved further, as discussed below. Building a user-friendly and responsive interface is of great importance for the human play. Unfortunately, the Soccerserver was not thought as a video-game and it is dif- ficult to construct a responsive enough interface for it. Our current interface is still not as good as commercial video- games. It would be possible to overeóme this issue by rep- licating the functionality of the Soccerserver, but bearing in Table 3 Summary of results in six testing matches for the "playing with opponents" behavior Instances 16,124 24,594 24,594 Pruning No Yes Yes Rules 234 332 69 Aecuracy 93.44%, 92.65% 91.90% Goals scored +5 +19 + 17 Goals against -29 - 2 5 - 2 0 8 mind interactive play. Rules could be learned from the modified Soccerserver and transferred to agents playing the actual Soccerserver, after, perhaps, some small adapta- tion. Having a responsive interface is very important for learning low-level behaviors. We have found out that the agent displayed some flawed behaviors, although not very frequently. This problem could be reduced by increasing the size of the dataset and pruning the model. However, the problem did not disap- pear completely. We believe that the underlying assump- tions of a purely input-output behavioral approach may be the culprit. Our approach works well when the behavior to be cloned is reactive (i.e. the behavior is an input-output map). But if the action of the human depends on hidden variables, in addition to what the human can see on the field, the model of the human will degrade. F o r instance, the human can make use of memories and predictions about the opponents, even when he is not watching them. But these variables are hidden to the modeling algorithm (i.e. it is not possible to see what the human is thinking). Therefore, our approach worked well because the behav- iors to be learned are mostly reactive, but even in this case, there are probably some hidden variables that could help to improve the results. In the future, we plan to add estimations of some hidden variables to the agent, via new attributes, computed by spe- cial purpose algorithms. If human-models are to be used, we should delve more into the cognitive functions applied by a person when playing Robosoccer (like planning, opponent prediction, trajectory computation, . . . ) . These cognitive abilities could be supplied to the agent, and used in the left hand side of the rules via new attributes. For instance, humans use memory to keep track in their minds of opponents and the ball, even when these objects are out of view. In the same way, tracking algorithms could be used to genérate attributes that estímate where the ball might be at some particular time. Imitating humans in the Robosoccer can be done at many levéis. Inspired in computer games like F I F A 2006, we intend to let the human use higher level actions like passing the ball, shooting to the goal, pushing the ball, dribbling, etc. In this case, the human player will only have to press a key, and the computer will carry out a pre-pro- grammed behavior (for passing the ball, etc.). Thus, the human can focus on a more strategic level and leave the low-level details to the computer. Modeling can be done at even higher levéis, as the team level, or the coach agent. We would also like to test the approach in more complex situations like real matches and real team play. Acknowledgements We would like to thank Vicente Matellan for letting us use his ABC2 routines, and Fernando Fernandez for provid- ing us with his useful Robosoccer team. Agapito Ledezma has been very helpful with his knowledge of Robosoccer. References Bain, M., & Sammut, C. (1999). A framework for behavioral cloning. In Machine intelligence agents (pp. 103-129). Oxford University Press. Bakker, P., & Kuniyoshi, Y. (1996). Robot see, robot do: An overview of robot imitation. In AISB' 96 workshop in robots and animáis (pp. 3- 11). Bauckhage, C, & Thurau, C. (2004). Towards a fair'n square aimbot - Using mixtures of experts to learn context aware weapon Handling. In Proceedings of the GAME-ON (pp. 20-24). Bauckhage, C, Thurau, C, & Sagerer, G. (2003). Learning human-like opponent behavior for interactive computer games. In B. Michaelis & G. Krell (Eds.), Pattern recognition. Lecture notes in computer science (Vol. 2781, pp. 148-155). Springer-Verlag. Bauer, M. (1999). From interaction data to plan libraries: A clustering approach. In International joint conference on artificial intelligence (pp. 962-967). Bauer, M. (1996). Machine learning for plan recognition. In Machine learning meets human computer interaction. Workshop of the Interna- tional conference on machine learning (pp. 5-16). Bloedorn, E., Mani, I., & MacMillan, T. R. (1996). Machine learning of user profiles: Representational issues. In Thirteen national conference on artificial intelligence (pp. 433-438). Bowling, M. (2003). Multiagent learning in the presence of agents with limitations. PhD thesis, Computer Science Department. Pittsburgh, PA: Carnegie Mellon University, May 2003. Available as technical reportCMU-CS-03-118. Brown, J. S., & Burton, R. R. (1978). Diagnostic models for procedural bugs in basic mathematical skills. Cognitive Science, 2, 155-192. Camacho, D., Fernández, F., & Rodelgo, M. A. (2006). Roboskeleton: An architecture for coordinating robot soccer agents. Engineering Appli- cations of Artificial Intelligence, 19(2), 179-188. Carmel, D., & Markovitch, S. (1996). Opponent modeling in multi-agent systems. In Adaptation and learning in multiagent systems. IJCAI' 95 Workshop. Lecture notes in computer science (Vol. 1042, pp. 40-52). Springer. Chiu, B. C, Webb, G. I., & Kuzmycz, M. (1997). A comparison of first- order and zeroth-order induction for input-output agent modelling. In Proceedings of the sixth International conference. Springer. Fernandez, F., Gutiérrez, G. & Molina, J. M. (2000). Coordinación global basada en controladores locales reactivos en la robocup. In Workshop Hispano-Luso de Agentes Físicos (pp. 73-85). Tarragona, España. Frnkranz, J., & Kubat, M. (Eds.). (2001). Machines that learn to play games. Nova Science Publishers. Hayes, G., & Demiris, J. (1994). A robot controller using learning by imitation. In Proceedings of the second international symposium on intelligent robotic systems (pp. 198-204). Kautz, H., & Alien, J. F. (1986). Generalized plan recognition. In Proceeding ofthe AAAInational conference on artificial intelligence (pp. 32-37). Kuniyoshi, Y., Inaba, M., & Inoue, H. (1994). Learning by watching: Extracting reusable task knowledge from visual observation of human performance. IEEE Transaction on Robotics and Automation, 10(6), 799-822. Kuzmycz, M. (1994). A dynamic vocabulary for student modeling. In Proceedings of the fourth international conference on user modeling (pp. 185-190). Ledezma, A., Aler, R., Sanchis, A. & Borrajo, D. (2004). Predicting opponent actions by observation. In RoboCup international symposium 2004 (RoboCup2004), Lisbon (Portugal). Luke, S , Hohn, C, Farris, J. Jackson, G., & Hendler, J. (1997). Co- evolving soccer softbot team coordination with genetic programming. In Proceedings of the first international workshop on RoboCup, at the international joint conference on artificial intelligence, Nagoya, Japan. Matellan, V., Borrajo, D., & Fernandez, C. (1998). Using ABC2 in the Robocup domain. In Robocup-97 Robot soccerworld cup I. Lecture notes in artificial intelligence (pp. 475^183). Springer-Verlag. 9 Paliouras, G , Karkaletsis, V., Papatheodorou, C, & Spyropoulos. C. D. (1999). Exploiting learning techniques for the acquisition of user stereotypes and communities. In Seventh international conference on user modelling (pp. 169-178). Piaget, J. (1945). Play, dreams and imitation in childhood. Heinemann. Ponsen, M , & Spronck, P. (2004). Improving adaptive game ai with evolutionary learning. Computer games Artificial intelligence, design and educalion, 389-396. Priesterjahn, S , Kramer, O., Weimer, A., & Goebels, A. (2005). Evolution of reactive rules in multi player computer games based on imitation. In International conference on natural computation (ICNC 05) Changsha, China. Ríley, P., & Veloso, M. (2000a). On behavior classification in adversarial environments. In Distributed autonomous robotic systems 4. Springer- Verlag. Ríley, P., & Veloso, M. (2001b). Coaching a simulated soccer team by opponent model recognitíon. In Proceedings ofthe agents international conference (pp. 155-156). ACM Press. Ríley, P., Veloso, M , & Kaminka, G. (2002). An empirical study of coaching. In Proceedings of DARS-2002, the seventh international symposium on distributed autonomous robotic systems. Sammut, C, Hurst, S , Kedzierand, D., & Michie, D. (1992). Learning to fly. In D. Sleeman (Ed.), Proceedings of the ninth international conference on machine learning (pp. 385-393). Morgan Kaufman. Self, J. A. (1988). Bypassing the intractable problem of student modelling. In Proceedings of the intelligent tutoring systems conference (pp. 107- 123). Sklar, E., Blair, A. D„ Funes, P., & Pollack, J. (1999). Training intelligent agents using human internet data. In Proceedings of the first Asia- Pacific conference on intelligent agent technology (IAT-99) (pp. 354- 363). Sklar, E„ Blair, A. D., & Pollack, J. B. (2001). Training intelligent agents using human data collected on the internet. In Agent engineering (pp. 201-226). World Scientific. Spronck, P., Sprinkhuizen-Kuyper, I., & Postma, E. (2002). Improving opponent intelligence through machine learning. In Proceedings of the fourteenth Belgium-Netherlands conference on artificial intelligence (pp. 299-306). Spronck, P., Sprinkhuizen-Kuyper, I., & Postma, E. (2004). Online adaptation of computer game opponent ai. International Journal of Intelligent Games & Simulation (IJIGS), 5(1), 45-53. Stone, P. (2000). Layered learning in multiagent systems: A winning approach to robotic soccer. MIT Press. Stone, P., & Veloso, M. (1998). A layered approach to learning client behaviors in the RoboCup soccer server. Applied Artificial Intelligence, 12, 165-188. Stone, P , & Veloso, M. (1999). Team-partitioned, opaque-transition reinforcement learning. In M. Asada & H. Kitano (Eds.), RoboCup-98 Robot soccer world cup II. Berlín: Springer-Verlag, Proceedings of the third international conference on autonomous agents. Suc, D., & Bratko, I. (1997). Skill reconstruction as induction of lq controllers with subgoals. In Proceedings ofthe 15th international joint conference on artificial intelligence (Vol. 2, pp. 914-920). Thurau, C, Bauckhage, C, & Sagerer, G. (2003). Combining self- organizing maps and multilayer perceptrons to learn bot-behavior for a commercial computer game. In Proceedings of the GAME-ON (pp. 119-123). Thurau, C, Bauckhage, C, & Sagerer, G. (2004a). Imitation learning at all levéis of game-AI. In Proceedings ofthe international conference on computer games, artificial intelligence,design and educalion (pp. 402- 408). Thurau, C, Bauckhage, C, & Sagerer, G. (2004b). Learning human-like movement behavior for computer games. In Proceedings of the eighth international conference on the simulation of adaptive behavior (SAB'04). Thurau, C, Bauckhage, C, & Sagerer, G. (2004c). Synthesizing move- mentsfor computer game characters. Lecture notes in computer science (Vol. 3175). Heidelberg, Germany: Springer-Verlag. Urbancic, T., & Bratko, I. (1994). Reconstructing human skill with machine learning. In European conference on artificial intelligence (ECAI1994) (pp. 498-502). van Lent, M., & Laird, J. (1999).. Learning hierarchical performance knowledge by observation. In Proceedings ofthe sixteenth international conference on machine learning (pp. 229-238). Morgan Kaufmann Publishers Inc. Webb, G., Pazzani, M., & Billsus, D. (2001). Machine learning for user modeling. User modeling and user-adapted interaction, 7/(19-20). Webb, G. I., Chiu, B. C, & Kuzmycz, M. (1997). Comparative evaluation of alternative induction engines for feature based modelling. Interna- tional journal of artificial intelligence in educalion, 8, 97-115. Webb, G. I., & Kuzmycz, M. (1996). Feature based modelling: A methodology for producing coherent, dynamically changing models of agents' competencies. User modeling and user-adapted interaction, 5(2), 117-150. Witten, I. H., & Frank, E. (2000). Data mining Practical machine learning tools and techniques with java implementations. Morgan Kaufman. 10