key: cord-0115647-l7cnmk7y
authors: Shin, Minkyu; Kim, Minkyung; Kim, Jin
title: Measuring an adaptive change in human decision-making from AI: Application to evaluate changes after AlphaGo
date: 2020-12-30
journal: nan
DOI: nan
sha: b02a51658eedaf4b0cd26510b78538f6c640c823
doc_id: 115647
cord_uid: l7cnmk7y

Across a growing number of domains, human experts are expected to learn from and adapt to AI with superior decision-making abilities. But how can we quantify such human adaptation to AI? We develop a simple measure of human adaptation to AI and test its usefulness in two case studies. In Case Study 1, we analyze 1.3 million move decisions made by professional Go players and find that a positive form of adaptation to AI (learning) occurred after the players could observe AI's reasoning processes rather than mere actions of AI. In Case Study 2, we test whether our measure is sensitive enough to capture a negative form of adaptation to AI (cheating aided by AI), which occurred in a match between professional Go players. We discuss our measure's applications in domains other than Go, especially in domains in which AI's decision-making ability will likely surpass that of human experts.

As Artificial Intelligence (AI) technology advances, AI will have an ever greater impact on human decision making. But how big will this impact be? Will humans adapt to AI by learning from it and making better decisions themselves? How quickly or slowly will humans adapt to AI? Moreover, in which domains will AI have more or less influence on human decision making? Answering these questions will become easier with better methods of measuring AI's impact on human decision making. To this end, we propose an intuitive and objective measure of human adaptation to AI in the game of Go and consider the measure's applicability in other domains.

In proposing our measure of human adaptation to AI, we examine the game of Go for two reasons. First, it is one of the first domains in which AI has pioneered to achieve superhuman performance in a complex decision making problem (Silver et al. 2016) . This superhuman performance is a necessary condition for measuring AI's impact on human decision making, not only because it encourages humans to learn from AI, but also because it provides an objective standard to which the quality of human decision making can be compared. On this latter point, suppose the quality of human decision making is at some level: say, one foot tall. Then, a superhuman AI can act as a taller yardstick that enables measuring changes in the one-foot-tall human decision making quality after the introduction of AI. In contrast, a par-human performance AI can act only as an equivalent-height, onefoot ruler that would not enable measurements of growth in human decision making quality. Now that AI programs are expected to show superior performance in many other decision making problems, our measure which assumes a superhuman AI can be used in many other settings.

Another reason we examine the game of Go is that a game is an effective setting to test how humans interact with AI and adapt their decision-making. The goal of a game is usually well-defined, and any action taken by human players as well as resulting changes in the environment is recorded in a database. Using these unique features of a game, researchers have studied various aspects of human decision making, from error correction to skill acquisition (Biswas 2015; Regan, Biswas, and Zhou 2014; Stafford and Dewar 2014; Strittmatter, Sunde, and Zegners 2020; Tsividis et al. 2017 ). Among many games, the game of Go presents arguably the most complex task, which explains AlphaGo receiving so much attention when it defeated a top human expert. We speculate that the more complex a domain in which the measure of human adaptation to AI has been tested to work, the more domains of human decision making the measure could be applied to. In other words, if our measure of human adaptation is useful in a domain as complex as the game of Go, it should be useful in less complex domains as well.

We present two case studies testing the usefulness of our measure. First, we examine the impact of AI on human experts' decision quality. Using historical data of human decisions before and after the emergence of AI such as Al-phaGo, we investigate whether and when humans learn to make better decisions like AI. Although additional information provided by AI could be beneficial to human players, the black-box nature of AI's decisions may have hindered human adaptation or generated misinterpretation. Indeed, our results suggest that merely observing AI's actions may not bring a meaningful improvement in human decision making. Observing AI's reasoning processes, however, does seem to improve human decision making. Second, we also study whether our measure can find a cheating behavior. Because AI outperforms human experts in the game of Go, it would be tempting for human experts to refer to AI's decisions even in a match between humans. Not surprisingly, there have been reports of cheating via AI not only in professional Go matches but also in chess matches, particularly as more matches are held online during the pandemic. We test whether our measure has enough statistical power to detect a cheating behavior which is already officially detected.

Our measure has broad applications beyond the game of Go. A natural extension is to measure users' performance in games, including chess and video games. We can also compare how stronger and weaker players are affected differently by AI technologies. The measure can be used in many other domains in which AI makes better decisions than humans but humans remain final decision makers due to the high-stakes nature of the decisions.

Key components in our measure, such as the value network, are available due to the recent development of Reinforcement learning (henceforth RL). Applying this algorithm to explain human behavior is not outrageous given that RL has been studied not just for developing an effective AI, but also for explaining human skill learning. For example, many computer scientists have made RL-based AI programs that solve complex decision problems from playing complex board games to scheduling educational activities (Bassen et al. 2020; Mnih et al. 2013; Nazari et al. 2018; Silver et al. 2017) . Cognitive scientists also have used RL to study how humans learn new skills (Fiorillo, Tobler, and Schultz 2003; Holroyd and Coles 2002; Niv 2009; Waelti, Dickinson, and Schultz 2001) . Recently, these two areas of research have interacted with each other under the framework of RL (Botvinick et al. 2019; Dabney et al. 2020; Hassabis et al. 2017) . This interaction between two areas suggests that RL is a useful approach for studying how humans and AI similarly make choices in dynamic settings. To this end, we simplify the decision making process of both a human and AI in the framework of RL. Then we devise a simple measure that can be used to quantify the level of human adaptation to AI.

Our measure, which we call Human-AI Gap, compares the quality of actions by humans to the quality of actions by an AI program. Defining the quality of actions can be a challenge, however, because consequences of actions are hard to pin down in a high-dimensional state space. Fortunately, modern AI programs based on Deep RL provide a useful byproduct while choosing superhuman actions. AI programs not only generate superior decisions, but also evaluate any actions. Specifically, the value network of AI evaluates many situations or states in a game, and the action-value network, also known as the Q value, evaluates any actions under any situations. Using this feature, we can tell how advantageous each state of a game is. In this regard, we can evaluate each human action and measure how good it is. Finally, we measure the Human-AI Gap for each human decision by calculating the difference between the quality of a human action and the quality of an action of AI.

It is important that our measure maps either human actions or AI actions into a continuous quality space and then compares those two. Given that we focus on the settings where AI is far superior to humans in decision making, a head-to-head comparison between human actions and AI actions would not capture the improvement of human decision making. Suppose there are two actions by a human player both of which are not matched with the optimal action by AI. One is done before AI and it would be much worse than AI's best action in terms of quality. The other comes from a human who learns from AI, but still it would be slightly worse than the action of AI. If we do not map those two actions into a quality dimension, both human actions are not differentiated in the head-to-head comparison. So we cannot capture the change of human decision quality. Now that AI provides a tool to measure the quality of actions, we leverage the tool to make a distinction between those two actions.

We use notations from previous literature (Igami 2020; Silver et al. 2017) to explain our measure more formally. It is defined mostly for the environment of the board game Go 1 , but it can be easily modified in other well-defined dynamic decision making problems. State space, |S|, represents a set of possible situations of a game. In a match between human players, a human player would face many situations in |S|. Given S∈|S|, a human player would decide the next move. A human player in k th order observes S k (state) and decides a k (action), i.e., a position to place the stone. We simplify the decision rule of human players as follows:

Human players would use their own evaluation parameter (θ Human ) to diagnose how advantageous the current state is, V (S k ; θ Human ). Based on the evaluation, they would apply their own strategy or decision rule (σ Human ), ending up with an action, a Human k . In this decision rule, we abstract away from complex interactions between human players. Instead, it is more like a single agent problem where each human player has to find an optimal choice at the given state to maximize the total reward. Any strategic responses from the opponent human player are subsumed under the transition of the state in our decision rule.

AI programs would also map a given state to an action by their policy network 2 . Although the actual process of AI decision making is as complex as human decision making, it can be simplified in the following way:

. It is notable that a human and AI facing the same state, S k , could reach a different action (i.e., a Human k = a AI k ). It may result from that the way a human evaluates the state, θ Human , is different from the way AI does, θ AI . Namely, a human may be too optimistic or pessimistic from the perspective of AI. In addition, AI may build a strategy, σ AI , that does not belong to any traditional strategy of a human, σ Human . AI trained by itself is free from any conventional human strategies, so it may produce novel actions.

The level of human adaptation may be quantified by comparing the parameters of human decision making before and after humans have AI. However, it is often not feasible because researchers usually observe only the actions of human players, not their policy nor their evaluation. It is hard to know how they would reach that decision or how they evaluate a certain state. That makes it not practical to back up the underlying decision rule of human players. Instead, our suggestive measure requires only the actions of human players as data. Then we simulate a counterfactual action by AI. We evaluate two different actions, one from a human and the other from AI, by using the value network of AI 3 . We define a measure of Humna-AI Gap in the following way.

Quality of an actual human choice First, in each realized state S k a human player faces, we let AI simulate a counterfactual action, which is a AI k . This action is what AI would choose if it has to decide instead of a human player. Second, we let AI quantify the quality of each action, one from a human player and the other from AI by evaluating two following states. Finally, we calculate a difference between those two evaluations. So our measure quantify a difference between an advantage induced by a counterfactual AI choice V (S k+1 (a AI k ) ; θ AI ), and an advantage induced by an actual human choice V (S k+1 (a Human k ) ; θ AI ). If AI comes up with a better action than a human, this gap measure would be positive. If a human chooses the same action as AI, then it would be zero. We use a much superior AI to generate counterfactual actions so this value is usually positive.

One caveat in the measure is that we use not just the policy network of AI but also the value network of AI. Specifically, it means that we evaluate the quality of actions, whether it comes from AI or a human, from the perspective of AI (V (· ; θ AI )). In this regard, our measure is valid when quantifying the level of human adaptation to AI. It needs precautions when human decision makers have improved their decisions in a way that AI does not appreciate. The value network of AI would not capture that change of behavior. So our measure is more appropriate to measure the change of human behavior getting close to AI.

How do we use it? Suppose human players observing the superiority of AI would update their decision rule. This human adaptation can happen in evaluating the state (θ Human ), or in building a strategy (σ Human ), or both. Humans with updated parameters would produce an action which AI evaluate better. Then researchers can observe our gap measure decreases, closer to zero. To test this hypothesis on whether human players make more AI-like choices, we construct a 3 Quality of a counterfactual AI choice reflects the current state of the game and acts as a maximum attainable value if a human player chooses the same action as AI. panel data in a player-month level.

where ∆ it denotes player i's average gap in month t. n it stands for the number of matches a player i has in month t, and k represents the order of a choice within a match. Now we have a well-constructed panel data, we can test many research hypotheses regarding human adaptation.

Adaptation I: Human learning from AI Background For the first study, we use the measure to compare human players' strategies before and after the launch of AI programs. The measure allows us to evaluate how close each move decision of human players is to the level of AI programs, and thus to tell if human decisions improved after AI programs were available. When AI programs did not exist, human players gained new information on Go game strategies by going over top players' move decisions in tournaments, and discussing the strategies with other human players (illustrated in Panel (a) 4 in Figure 1 ). What made AlphaGo so sensational in the human Go community is that the AI program used many unorthodox tactics. AlphaGo's demonstration of unfamiliar but effective strategies motivated human players to discuss and learn the strategies of AI. Consequently, the way human players study winning strategies completely changed into getting tutored by AI (illustrated in Panel (b) 5 in Figure 1 ).

(a) Human Learning Before AI (b) Human Learning After AI Figure 1 : Illustration of change of the way human Go players learn winning strategies since AI Interestingly, two different AI programs were released at different times. One is AlphaGo in March 2016 and the other is Opensourced AI programs (e.g., LeelaZero) in October 2017. The main difference between the programs is whether players can observe the programs' intermediate reasoning process behind each move decision. AlphaGo and its subsequent versions show its actions only (i.e., a sequence of positions an AI placed a stone). However, the opensourced programs and education tools provide information on the detailed thought process behind each action of AI as well. They show the contingency plan of AI such as how AI would respond to a hypothetical situation following each counterfactual choice at any states. This allows human players to review the strategies of AI under various situations. Also, humans observe how AI evaluates each state of a game (e.g., the current win probability 6 of each player and change in win probability if the human player chooses to deviate from AI's best move). More details on what information is given to human players are explained in the Appendix. We leverage the fact that two AI programs giving two different learning materials to human players were released sequentially. It gives a chance to examine the conditions under which human players adapt to AI.

Data Our data spans from January 2014 to March 2020 and it comes from official matches between human professional players. It is composed of three datasets: (i) 30,995 matches between 357 Korean professional players, where we observe match date, the identity of players, and match outcomes, (ii) 1.3 million move decisions of the Korean professional players from matches between human players. We webscraped the datasets from the Korean Go Association and other websites 7 . For every single move choice by a human player, we simulate the optimal move decision of AI 8 under the same state of the game and compute our measure by comparing a value network of human players' actual choice and that of AI's choice. Thus, we have 1.3 million move decisions of human players, 1.3 million move decisions of AI, and the evaluation by AI on each of these decisions. The gap between a value network of human decisions and that of AI decisions is our measure of human decision's effectiveness. Table 1 : Summary Statistics Table 1 shows the summary statistics of the data. Our data includes heterogeneous players in terms of performance, whose winning rate is 43% on average. For each match, two professional players make an average of 216 move decisions. AI evaluation on each move decision indicates that 6 Although we use the term win probability, its exact interpretation is more subtle. Still it is used in the Go community so we follow this term. 7 Not every match has been saved with a detailed record of move decisions, but we collected the historical data from multiple sources to get more complete history. The data containing match results spans from 2012 to 2020. The data containing move decisions spans from 2014 to 2020. 8 We use an AI program called LeelaZero to analyze our data. We use GPU provided by Google Colab Pro (P100) in our simulation. human players make sub-optimal choices, which results in a 3.5% loss in win probability.

Model-free descriptive pattern We calculate Human-AI Gap (∆) for every decision made by every human player in our dataset. A value of zero means that a human player completely replicated the AI's decision (∆ k = 0 ⇐⇒ a Human k = a AI k ) and made the optimal move, while a positive value indicates the extent to which AI's decision was superior to that of the human player, or put differently, the extent to which the human player's decision quality trailed that of the AI. We present the pattern of ∆ k in Figure 2 . The average Human-AI Gap of each set of 10 moves (e.g., Speculation on the inverted U pattern of Human-AI Gap across moves Perhaps the first thing that jumps out in Figure 2 is the inverted U pattern of Human-AI Gap across moves. We speculate that the pattern emerges from two opposing forces at work. First, as the match progresses (and the move number increases), finding the optimal moves becomes harder because stones interact with one another more. For example, in the beginning, when there are only one, two, or several stones, the black and white stones tend not to interact much with each other, as both players seek to control open territory in the corners and sides that are free from either side's influence. In this early stage, human decisions are not too inferior to AI's decisions, leading to smaller Human-AI Gap for earlier moves. As the game progresses (post opening stage), however, black and white stones now clash against each other to fight for territory. In this early to middle stage of the match, optimal moves require thinking not only about how to control more territory as in the beginning, but also how to capture the opponent's stones or how to survive under the opponent's attack. Thus, this clash in the early to middle stage of the match presents a greater challenge for the human decision maker, and as a result, human decisions are inferior to AI's decisions by a greater margin, leading to greater Human-AI Gap for moves in the middle stage. Human-AI Gap peaks at a point and starts decreasing, however, because of the second, opposing force: the more stones there are on the board, the fewer possible moves become available. For example, in the middle stage, there may be 30 possible moves to consider, whereas by the end of the match, there may be only 5 possible moves to consider. With fewer possible moves to consider, the human player can find the optimal moves more frequently, and as a result, human decision quality can trail that of AI more closely as in the beginning. We speculate that these two opposing forces give rise to the inverted U shape of Human-AI Gap. Insights from the pattern of Human-AI Gap Using our measure of Human-AI Gap, we are able to observe the pattern of human decision quality over the course of a match. One possible insight from this observation may be that room for improvement is greatest in the middle stage of the match, because the middle is where Human-AI Gap is the greatest. Human players may find that studying moves in the middle stage may improve their game more than studying moves in the early or late stage of the match. Thus, our measure of Human-AI Gap can be used to coach human players on what to study. Another possible insight may be that trying to improve human decision making in the middle stage of the match may be futile. Although this contradicts the first insight, it could just as well be true. Human experts may have concluded that improving their middle-stage game is very difficult as compared with improving the early-stage game. That is, although human experts have managed to improve their early-stage game (the red, solid line shifting down to the blue, dashed line for Moves 1-50 in Figure 2 ), perhaps they could not improve their middle-stage game despite much effort (the blue, dashed line overlaps the red, solid line for Moves 51-140 in Figure 2 ). Improving the middle-stage game may be very hard, perhaps due to intractable complexity and lack of similarity from one match to another, both of which may prevent discovery or learning of any new principles. If so, human experts may instead double down and focus their effort even more on improving their early-or late-stage game. Whichever insight may reflect the reality, our measure could be useful in suggesting how human experts' effort should be allocated.

Human learning in the early stage of the game More important than the inverted U pattern are downward shifts in Human-AI Gap (for Moves 1-50). Interestingly, Human-AI Gap decreased only a little bit after human players could observe AlphaGo's actions, as evidenced by a barely noticeable downward shift from the "Before AI" curve to "After AlphaGo" curve in Figure 2 . In contrast, Human-AI Gap dropped markedly after open-sourced AI programs became available, as evidenced by a larger downward shift from "After AlphaGo" curve to "After Open-sourced AI" curve in Figure 2 . Will we observe this pattern when we take into account the difference between human players and plot Human-AI Gap across time? In the next section, we investigate whether human decision quality indeed increased more after open-sourced AI became available than after Al-phaGo's debut. Because the decrease in Human-AI Gap was concentrated in the early and middle stages of the game (Moves 1-50), and because no such decrease was readily observable for Moves 51-140, we focus our attention on the first 50 moves in each match when we investigate human decision quality over time next (so K = 50).

We examine Human-AI Gap over time after controlling for individual difference. As defined earlier, we construct a variable, ∆ it , representing player i's average gap in month t. Then we run the following regression: ∆ it = α i + τ t + it . In Figure 3 , we report the estimated time trend,τ t . The red vertical line on the left marks March 15, 2016, the date of match between Lee Sedol and AlphaGo, and the red vertical on the right marks October 25, 2017, the date when open-sourced AI program Leela Zero was publicly released, which was followed by releases of similar AI programs and education tools. Consistent with a tiny downward shift in Figure 2 , we see little change in Human-AI Gap in the period between the two red verti- Figure 3 : Human-AI Gap over Time. Alphago, despite its superior performance against a human, does not help human players to change behavior in a meaningful way. Human players start to follow up AI performance since the release of the open-sourced AI. This finding highlights the importance of access to reasoning process of AI. cal lines. Human-AI Gap significantly drops after the second vertical line indicating the time when open-sourced AI and its analysis tools became available. The decline in gap over time shows that human players adapted to AI programs gradually, more so after the education tools were available. The finding provides suggestive evidence that human players modified their choices and did better in Go matches after their access to AI.

Background In this second case study, we examine how our measure can be useful in detecting one negative form of adaptation to AI: receiving help from AI to gain an unfair competitive advantage, i.e., cheating. In September 2020, a 13-year-old professional Go player cheated in a match against a 33-year-old top-notch Go player by using an AI program that suggests optimal moves for given states. The high-stakes match (one of Round of 32 matches in a tournament that awards $175,000 in total prize money) was held online due to a COVID-19 lockdown, opening door to the possibility of cheating, which would have been difficult to pull off had the match been held offline with many eyes around. Not surprisingly, the cheating player showed an extraordinary performance in the match, and as a result, a debate ensued on online forums regarding whether she received help from AI. The controversy culminated in the cheating player finally confessing her transgression in November 2020. Would our measure be sensitive enough to detect such cheating behaviors?

Data We first obtain move data for all 52 matches of the cheating player's professional career, including the match in which she cheated. We then use an AI program to calculate a winning probability associated with each move in these 52 matches and a winning probability associated with an AI's optimal move. As in the first study, we subtract the former from the latter to calculate the Human-AI Gap. To be consistent with the previous study, we focus our analysis on the first 50 moves by each player in each match. This results in 50 values of Human-AI Gap for the cheating player in the cheating match and 2,375 values in the 51 non-cheating matches (after missing values were removed), for a total of 2,425 values in all 52 matches.

Comparison between the cheating match and all other career matches We hypothesize that our measure of Human-AI Gap could detect an increase in the cheating player's decision quality when she received assistance from AI as compared with when she did not. Indeed, that is what we find. In the cheating match, the cheating player's move decisions show smaller Human-AI Gaps (M = 1.02, SD = 2.13), indicating better decision quality, than move decisions in the non-cheating matches (M = 2.35, SD = 4.30), t(57.78) = 4.23, one-tailed p ¡ .001, Cohen's d = 0.31 (see Figure 4 ). Two nonparametric tests show converging results. A Wilcoxon rank-sum test reveals that Human-AI Gaps are more likely to be smaller when she cheated (Mdn = 0.60) than when she did not cheat (Mdn = 0.75), W = 67883, one-tailed p = .041. Similarly, a two-sample Kolmogorov-Smirnov test reveals that the Human-AI Gaps in the cheating match and those in the non-cheating matches have different distributions, D = 0.23, one-tailed p = .006. Results from these three tests show that our measure of Human-AI Gap can be useful to detect higher decision quality from cheating. But can our measure also detect greater stability in decision quality from cheating? We hypothesize that Human-AI Gap will be more stable across the moves in the cheating match than in the non-cheating matches, because moves sug-gested by AI will be consistently optimal, whereas moves made by the human player herself will be optimal less consistently (less frequently). In other words, we want to test whether our measure will exhibit lower variance (more stability in decision quality) in the cheating than in the noncheating matches. As expected, a Levene's test reveals that variance in decision quality (i.e., Human-AI Gap) is significantly lower in the cheating match (var = 4.53, SD = 2.13) than in the non-cheating match (var = 18.51, SD = 4.30), F(1, 2423) = 4.85, one-tailed p = .014. This case study shows that our measure could be useful in detecting a negative form of adaptation to AI, namely receiving AI assistance for an unfair competitive advantage in a professional competition.

In the first case study, we demonstrate a surprising result that human professional Go players did not modify their strategy even after they noticed that AlphaGo beat humans. We find that the educational tools accompanied with open-source AI programs led to human players' behavioral change. In Section 2 of the Appendix, we corroborate this finding by comparing the behavioral change of human players who had access to AI programs versus who did not 9 . The result emphasizes that it is not enough to show the efficacy of AI to induce a human adaptation. It highlights that humans do not adapt their high-stake decisions to AI until they understand it. To summarize, our measure can be used to track the level of human adaptation to AI, providing what is necessary for human adaptation.

In the second case study, we reconfirm that our measure has an enough statistical power to detect the recent cheating behavior using AI. Human players easily get access to AI programs, which is tempting to use AI even in the match between humans. In 2020, there have been many cheating scandals in chess. It spans from a 17-year-old Polish chess player who was caught using a phone during a match, to a 36-year-old grandmaster chess player who was accused of cheating in a professional match. Similar to our second study, there was a controversy about whether they cheated or not. It would become more difficult to detect this cheating because more human players learn AI and choose AIlike moves as shown in the first study. Online chess websites such as Chess.com have invented many tools to detect this cheating behavior 10 . Our measure which is easy to construct can be one viable way to guarantee a fair play between human players.

Finally, our measure can shed light on additional ramifications of advance in AI programs in the board game Go. Does the gap between human players increase or decrease? Public release of AI-based analysis tools allow human Go 9 We examine the gap measure of professional Go players who served in military during the time the AI programs were available. Those who could not learn from the AI programs due to military service were found to show no improvement over time, which corroborates our findings on human adaptation. 10 https://www.chess.com/article/view/online-chess-cheating players of all skill levels to analyze and simulate consequences of any move at any stage of the game. It is a game changer in how human Go players develop their strategy. In the past, high-ranked players got a competitive advantage to study and discuss latest strategies with other high-rated players. Now low-rated players can learn from AI programs which are much superior to the best human players. But whose game improves more from the access to AI, weaker or stronger players'? On the one hand, weaker players may learn more from AI and catch up to stronger players at a faster rate than they otherwise would have, reducing the performance gap. On the other hand, stronger players may better understand and internalize the baffling yet effective moves by AI algorithm, widening the performance gap. Our measure can be used to answer this question whose implication is broader than the context of the board game Go.

Other potential applications of the measure Our measure can be used in contexts where AI programs generate more effective decisions, but humans remain the final decision maker and are held accountable. The key idea of the measure can be used to answer novel questions about human decision making in response to AI. Below, we discuss potential applications of the measure. Our measure may reveal factors that lead to slower or faster rate of adaptation to AI. For example, it may reveal that people of certain characteristics (e.g., age, education, experience, training, or any other relevant background) have advantages or disadvantages in adapting to AI as compared with others. If such factors are identified, leaders of an organization or government have to deal with a difficult equity vs. efficiency tradeoff: Should the leadership help the disadvantaged members so that most members of an organization or society adapt to AI at similar rates and enjoy producing similar levels of output? Or should the leadership focus more on encouraging the advantaged members to maximize adaptation to AI at the organization or society level? Answering these questions may not be easy, but our measure can nevertheless help identify such factors and enable organization or government to make appropriate decisions given their environment.

Another natural extension is in the area of personalized education. AI programs leveraging massive amounts of past student data can outperform human teachers in deciding the optimal sequence of materials to present to students at different skill levels. For example, when teaching students a new set of concepts, teachers may contemplate which concepts to teach earlier and which concepts to teach later, because learning is a dynamic process in which learning in earlier stages affects learning in later stages. That is, as in the game of Go, teachers as decision makers must think not only about which concept to teach at given points (similar to thinking about which move to make given the state), but also about how teaching the concept affects learning at later points (similar to thinking about how the move affects later states), all the while trying to maximize overall learning (similar to maximizing overall winning probability). Teachers can thus learn from AI programs to improve their teaching, and our measure can be used to evaluate teachers' adaptation to AI. Similarly, our measure can help game developers create a better game tutorial, a tool that teaches novice players basic rules and essential mechanics of the game. Here, AI programs may optimize the order and contents of the tutorial, but human game developers make the final decisions on constructing the tutorial. In this case, our measure can be used to evaluate novice players' skills and engagement during and after two tutorials, one created by AI and another created by human developers, and these evaluations can assess human developers' adaptation to AI.

More broadly, our key idea to compare the output of AI and that of humans is useful in other settings. For example, AI technologies significantly advanced not only in medical diagnosis but also in treatment decisions, such as recommending prescriptions. Since each treatment decision would affect the future health status of patients, doctors need to make decisions in a dynamic setting. Even though AI may make better decisions in this context, ultimately human doctors will have the final authority and responsibility in diagnosis and treatment. Recent research shows that collaboration between human doctors and AI significantly improves the accuracy of predictions or diagnoses (Tschandl et al. 2020) . As human experts continue with such adaptation to AI, our measure can be used to monitor the progress of the adaptation. Many other decision-making problems in business also exhibit features of dynamic optimization (Rust 2019) , so firms adopting AI programs as a supporting tool to human managers can use our measure to monitor improvement in human decision making.

As AI makes better decisions than humans, human experts adapt to AI. Often, this adaptation comes in a positive form of learning from AI, but it can also appear in a negative form, such as cheating. In this paper, we propose a simple measure, Human-AI Gap, and test whether the measure can detect and quantify human adaptation to AI. Our results using this measure yield valuable insights in the game of Go, such as when learning occurred (i.e., after observing AI's reason-ing processes rather than its mere actions), where learning occurred (i.e., early to middle stage of the match, moves 1-50), for whom learning occurred (i.e., experts with access to AI as compared with experts without the access), and whether cheating occurred. Moreover, our results suggest that the measure has broader applications in various domains other than game of Go, ranging from managing adaptation to AI in an organization or society, to personalized education, and even to medical diagnosis and treatment. each player (Part 3) and change in win probability if a human player chooses to deviate from AI's best move (Part 4). For example, human players can try departing from AI's candidate actions (in colored circles), and get feedback on their own choice to quantify any loss the deviation generates. If they disagree with AI choice or evaluation, they can test their strategy and see how AI penalizes it.

Due to this feature of AI education tools, human players can review their choices in a match with other human players. Altogether, information on AI's strategy and evaluation provided by education tools gives humans an opportunity to understand better AI programs' underlying primitives behind each move decision. -Change in win probability as a consequence of each choice (Part 1) -Change in win probability throughout a match (Part 4) Table 2 : AI Programs and its implication in human learning 2. Experts with or without the access to AI Treated and Control players Our main strategy to find more convincing evidence is to distinguish human professional players who did and did not have access to AlphaGo or to the subsequent AI programs. We take advantage of the mandatory military service in South Korea. South Korean male citizens are required to serve in the military for 18-24 months before age 29, which forces the military-serving players not to be able to participate in Go matches more than once a month and to be away from recent trends in AI programs related to Go. Specifically, most of them are expected to be confined in their military base but they get short-term leave every other month. We confirm from the data that players serving in the military can participate in a tournament once a month at most. They do not have enough time nor a high performance computer to self-teach unfamiliar tactics, strategies, or insights discovered by AI programs. Thus, we define treated players as a group of players who were not serving in the military when AI programs became available, and a control group as players who were serving in the military and did not have access at least for 6 months. figure 5 , the y-axis is the Human-AI Gap (∆), the x-axis is time, and the two vertical lines represent when the AI programs were available. It shows no significant change over time in our gap measure of control group. It is a sharp contrast to the figure 3, which implies that the access to AI programs that make human players choose better move choices. We verify the finding by Differencein-Difference estimation. Consider the following regression equation:

∆ it = α i + τ t + β · I(i ∈ Treatment group) · I(t ∈ Post AI period) + it We report the result ofβ in table 3. The dependent variable is the average gap of each player in each month. Each column in the table is based on the different definitions of post-AI periods: all months after Event 1 (March 2016), months between Event 1 and Event 2 (Dec 2017), or months after Event 2 only. The treatment effect of access to AI, captured by the interaction term of (Treatment group) and (Post AI period), is only significant after Event 2. Human players could place their stones in better positions only after they could observe AI programs' strategy (e.g., candidate moves at each state, simulated sequence of actions after the choice) and evaluation (e.g., win probability at each state, change in win probability depending on each candidate move) and understand AI's underlying principles behind each decision. The effect is not significant after they observed AI programs' choices at each state, which might have been considered a "black-box" that generates incomprehensible outputs. To address a concern that serving in military itself could have affected the Human-AI Gap, we study human players who served in the military before AI programs appeared, and implement the same statistical test as before. Specifically, we compare the Human-AI Gap between two groups: players that had to serve in military in 2015 and those that didn't. Then we limit the sample period before March 2016 when AlphaGo came out. We report the coefficient of the interaction term as table 3. Table 4 shows that the military serving by itself does not significantly affect the Human-AI Gap. Although human players' match performance could have deteriorated, their move decisions were as good as those prior to military-serving according to Human-AI Gap. Thus, we exclude the explanations that the result in 3 is driven by military-serving, and conclude that access to AI drives the effect. 

Reinforcement Learning for the Adaptive Scheduling of Educational Activities

Measuring Intrinsic Quality of Human Decisions

Reinforcement learning, fast and slow

A distributional code for value in dopamine-based reinforcement learning

Discrete coding of reward probability and uncertainty by dopamine neurons

The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity

Artificial Intelligence as Structural Estimation: Deep Blue, Bonanza, and AlphaGo

Playing atari with deep reinforcement learning

Reinforcement learning for solving the vehicle routing problem

Reinforcement learning in the brain

Human and computer preferences at chess

Has dynamic programming improved decision making?

Mastering the game of Go with deep neural networks and tree search

Mastering the game of go without human knowledge

Tracing the trajectory of skill learning with a very large sample of online game players

Life cycle patterns of cognitive performance over the long run

Human-computer collaboration for skin cancer recognition

Human learning in Atari

Dopamine responses comply with basic assumptions of formal learning theory

We appreciate feedback from Kosuke Uetake, Jiwoong Shin, K Sudhir, Elisa Celis, John Rust, Vineet Kumar, Ian Weaver and participants at the 2020 Marketing Science conference and Yale Quantitative Marketing seminar for their valuable and constructive comments. We gratefully acknowledge the support from Korean Go Association to get the data.

We compare two sets of focal AI programs and discuss their implication from the perspective of human players.

March 2016, AlphaGo showed its superhuman performance by beating a world-class human champion. AlpahGo's performance improved even more between 2016 and 2017 and the AI program beat human Go champions in all 64 online Go matches. In May 2017, DeepMind released AlphaGo's moves in 50 matches between AlphaGo and AlphaGo. Then DeepMind made AlphaGo retired. Even though its impact was sensational, the released information to the public was limited to the AI's final actions only. As summarized in the top plot of Table 2 , human players could have access to AlphaGo's final actions only (i.e., a sequence of positions an AI placed a stone). Because those actions are unconventional, it was not easy for human players to appreciate how AlphaGo would reach its actions. Human players often had heated debates to figure out the rationale behind the decisions of AlphaGo. Although AlphaGo provided new perspectives on the game strategy to human players, it was not the most ideal learning materials because it did not provide that rationale to human players.

In late 2017, oneand-a-half year after AlphaGo, open-source AI programs such as LeelaZero were released. Unlike AlphaGo of which Deepmind did not publish the source code, the open-source programs enabled developers to devise education tools such as Lizzie. Using the education tool, human players could review their choices in the matches with other human players. They could compare their own evaluation on particular states with the one calculated by AI. By doing so, they could realize whether they were too optimistic or too pessimistic at certain points in the match. In addition, they could observe what AI would have chosen at the state of the match, where they ended up making a sub-optimal move.As summarized in the bottom panel of Table 2 , education tools provided information not only on the actions by AI, but also on the detailed thought process behind each action of AI. First, human players observe a set of actions AI considers promising as well as the AI's final choice (colored circles in Part 1). The tool also shows how AI would respond to a hypothetical situation following each counterfactual choice at the current state (Part 2). Basically, the tool allows human players to review AI's strategy. Here, a strategy refers to a contingency plan of actions under various situations.Second, human players observe how AI evaluates each state of a game. The number inside colored circles indicates the expected win probability of each action 11 (Part 1). Furthermore, the program shows the current win probability of