key: cord-0163433-uu44gqur authors: Zhao, Wei; Mezzetti, Claudio; Renou, Ludovic; Tomala, Tristan title: Contracting over persistent information date: 2020-07-12 journal: nan DOI: nan sha: b6cab1e46506bf35a055d012f31979acc15f3850 doc_id: 163433 cord_uid: uu44gqur We consider a dynamic moral hazard problem between a principal and an agent, where the sole instrument the principal has to incentivize the agent is the disclosure of information. The principal aims at maximizing the (discounted) number of times the agent chooses a particular action, e.g., to work hard. We show that there exists an optimal contract, where the principal stops disclosing information as soon as its most preferred action is a static best reply for the agent or else continues disclosing information until the agent perfectly learns the principal's private information. If the agent perfectly learns the state, he learns it in finite time with probability one; the more patient the agent, the later he learns it. Public authorities routinely attempt at persuading the public to follow specific recommendations. For instance, national health services regularly promote the heath benefits of eating fruits and vegetables, of exercising or of limiting alcohol consumption. As another instance, during the Covid-19 outbreak, authorities were instructing the public to practice social distancing. In these instances, as in many others, the public may not fully embrace the recommendations. The public may elect to follow some but not all recommendations or to flexibly interpret them. E.g., an individual may decide that eating three portions of fruits and vegetables a day is plenty, despite a recommendation of seven. Similarly, during the Covid-19 lockdowns, an individual may decide to exercise outdoors two to three times a day, despite public authorities recommending to exercise outdoors at most once a day. In most of these instances, public authorities cannot, or do not choose to, enforce their recommendations. Yet, they can still nudge the public towards full compliance with judicious provision of information. The problem of the optimal provision of information over time is naturally not limited to persuading the public to follow some recommendations. Other examples include financial advisers persuading clients to follow their advises, firms persuading customers to purchase their products, or employers persuading employees to work hard. How best to do so? To study this question, we consider a simple "principal-agent" model, where the sole instrument the principal has is information. 1 The principal aims at incentivizing the agent to choose the same action, e.g., be fully compliant, buy the product or work hard, as often as possible, and can only do so by disclosing information about an unknown (binary) state, e.g., whether the policy recommendation is effective, the product is good or the task is easy. We assume the principal commits to a disclosure policy. E.g., during the Covid-19 outbreak, the UK government committed to daily report the number of new cases of Covid-19 (and was choosing the testing policy). 2 We refer to the commitment to a disclosure policy as a "contract." The main contribution of this paper is to provide a complete characterization of an optimal contract. We now discuss the salient features of the optimal contract we characterize. We do so in the context of a public authority (the principal) persuading the public 1 That is, the principal cannot make transfers, terminate the relationship, choose allocations or constraint the agent's choices. 2 Public Health England was providing the number of new cases. As Public Health England is subject to the Freedom of Information Act, the public had the opportunity to verify the accuracy of the numbers reported. See https://coronavirus.data.gov.uk/about#england-cases, last accessed 10 July 2020. (the agent) to follow a specific recommendation. The main property of the optimal disclosure policy is its gradualism: information is gradually and continuously revealed over time. At each period, it incentivizes the agent to follow the principal's recommendation with the promise of further information disclosure in the future. We further illustrate the main properties of our policy -particularly how beliefs evolve over time -with the help of Figure 1 . Figure 1 plots four representative evolutions of the agent's belief about the "high" state -the state where the cost to incentivize the agent relative to the benefit is the highest. In each panel, the grey region "OPT" indicates the region at which following the recommendation is (statically) optimal. An arrow pointing from one belief to another indicates how the agent revised his belief within the period following a signal's realization. Within a period, the agent takes a decision after the The following are general properties of our optimal policy. The first observation to make is that the agent continuously updates his belief until either he perfectly learns the state or following the recommendation becomes (statically) optimal. Moreover, if the agent learns the state, he learns it in finite time. We provide an explicit characterization of the priors at which the agent eventually learns the state. Second, along the paths at which the agent follows the recommendation, his beliefs about the "high" state are decreasing. Intuitively, the optimal contract exploits the asymmetry in opportunity costs and lowers the perceived opportunity cost -hence making it easier to incentivize the agent-by sometimes informing him when the opportunity cost is high. 3 Third, with the exception of panel (D), the policy does not disclose information to the agent at the first period. Thus, adopting the definition of persuasion as the act of changing the agent's beliefs prior to him making any decision, information disclosure rewards the agent for following the recommendation, but does not persuade him in panels (A), (B) and (C). Yet, as panel (D) illustrates, the policy sometimes needs to persuade the agent. For instance, if the promise of full information disclosure at the next period wouldn't incentivize the agent, then persuading the agent is necessary, that is, the policy must generate a strictly positive value of information. However, there are other circumstances at which persuading the agent may be necessary. Persuading the agent may reduce sufficiently the perceived opportunity cost to compensate for the instantaneous loss. Finally, with the exception of panel (B), the policy does not induce the agent to believe that following the recommendation is optimal. This is markedly different from what we would expect from the static analysis of Kamenica and Gentzkow [2011] . Intuitively, the "static" policy is sub-optimal because it does not extract all the informational surplus it creates, that is, it creates a strictly positive value of information, but does not extract it all. (The participation constraint of the agent does not bind.) And even in panel (B), the beliefs do not jump immediately to the "OPT" region. In fact, the belief process may approach the "OPT" region only asymptotically. Related literature. The paper is part of the literature on Bayesian persuasion, pioneered by Kamenica and Gentzkow [2011] , and recently surveyed by Kamenica [2019] . The two most closely related papers are Ball [2019] and Ely and Szydlowski [2020] . In 3 To be precise, under our policy, upon perceiving the signal "the opportunity cost is high," the agent learns that this is indeed true. However, the signal is not sent with probability one. This corresponds to the arrows ending at 1. common with our paper, both papers study the optimal disclosure of information in dynamic games and show how the disclosure of information can be used as an incentive tool. The observation that information can be used to incentivize agents is not new and dates back to the literature on repeated games with information incomplete, e.g., Aumann et al. [1995] . See Garicano and Rayo [2017] and Fudenberg and Rayo [2019] for some more recent papers exploring the role of information provision as an incentive tool. The class of dynamic games studied differ considerably from one paper to another, which makes comparisons difficult. In Ely and Szydlowski [2020] , the agent has to repeatedly decide whether to continue working on a project or to quit; quitting ends the game. The principal aims at maximizing the number of periods the agent works on the project and can only do so by disclosing information about the complexity of the project, modeled as the number of periods required to complete the project. Thus, their dynamic game is a quitting game, while ours is a repeated game. When the project is either easy or difficult, the optimal disclosure policy initially persuades the agent that the task is easy, so that he starts working. (Naturally, if the agent is sufficiently convinced that the project is easy, there is no need to persuade him initially.) If the project is in fact difficult, the policy then discloses it at a later date, when completing the project is now within reach. A main difference with our optimal disclosure policy is that information comes in lumps in Ely and Szydlowski [2020] , i.e., information is disclosed only at the initial period and at a later period, while information is continuously disclosed in our model. In the words of Ely and Szydlowski, our policy continuously leads the agent towards following the recommendation. Another main difference is as follows. In Ely and Szydlowski, only when the promise of full information disclosure at a later date is not enough to incentivize the agent to start working does the principal persuade the agent initially. This is not so with our policy: the principal persuades the agent in a larger set of circumstances. This initial persuasion reduces the cost of incentivizing the agent in future periods. 4 Ball [2019] studies a continuous time model of information provision, where the state changes over time and payoffs are the ones of the quadratic example of Crawford and 4 See Orlov et al. [2019] and Smolin [2018] for two other papers on quitting games and information disclosure. To illustrate the difficulties of comparing models, Orlov et al. [2019] consider a quitting game, where the principal also aims at delaying the quitting time as far as possible. The quitting time is the time at which the agent decides to exercise an option, which has different values to the principal and the agent. The principal chooses a disclosure policy informing the agent about the option's value. The optimal disclosure policy is to fully reveal the state with some delay. (Note that the principal is referred to as the agent in their work.) This policy is not optimal in Ely and Szydlowski [2020] . Seemingly innocuous differences in models have important consequences. Sobel [1982] . Ball shows that the optimal disclosure policy requires the sender to disclose the current state at a later date, with the delay shrinking over time. The main difference between his work and ours is the persistence of the state (also, we consider two different classes of games). When the state is fully persistent, as in Ely and Szydlowski [2020] and our model, full information disclosure with delay is not optimal in general. (See Example 1.) A principal and an agent interact over an infinite number of periods, indexed by t ∈ {1, 2, . . . }. At the first stage, before the interaction starts, the principal learns a payoffrelevant state ω ∈ Ω = {ω 0 , ω 1 }, while the agent remains uninformed. The prior probability of ω is p 0 (ω) > 0. At each period t, the principal sends a signal s ∈ S and, upon observing the signal s, the agent takes decision a ∈ A. The sets A and S are finite. The cardinality of S is as large as necessary for the principal to be unconstrained in his signaling. 5 We assume that there exists a * ∈ A such that the principal's payoff is strictly positive whenever a * is chosen and zero, otherwise. The principal's payoff function is thus v : The agent's payoff function is u : A × Ω → R. The (common) discount factor is δ ∈ (0, 1). We write A t−1 for A × · · · × A t−1 times and S t−1 for S × · · · × S t−1 times , with generic elements a t and s t , respectively. A behavioral strategy for the principal is a collection of maps (τ t ) t , with Similarly, a behavioral strategy for the agent is a collection We write V(τ, σ) for the principal's payoff and U(τ, σ) for the agent's payoff under the strategy profile (σ, τ ). The objective is to characterize the maximal payoff the principal achieves if he commits to a strategy τ , that is, for all σ . 5 From Makris and Renou [2020] , it is enough to have the cardinality of S as large as the cardinality of A. Several comments are worth making. First, we interpret the strategy the principal commits to as a contract specifying, as a function of the state, the information to be disclosed at each history of realized signals and actions. That is, the contract specifies a statistical experiment at each history of realized signals and states. The principal chooses the contract prior to learning the state. An alternative interpretation is that neither the principal nor the agent know the state, but the principal has the ability to conduct statistical experiments contingent on past signals and actions. We can partially dispense with the commitment assumption. Indeed, since the choices of statistical experiments are observable, we can construct strategies that incentivize the principal to implement the specified statistical experiments. 6 Second, the only additional information the agent obtains each period is the outcome of the statistical experiment. Third, the state is fully persistent and the principal perfectly monitors the action of the agent. Finally, the only instrument available to the principal is information. The principal can neither remunerate the agent nor terminate the relationship nor allocate different tasks to the agent. We purposefully make all these assumptions to address our main question of interest: what is the optimal way to incentivize the agent with information only? Example 1. Throughout the paper, we illustrate most of our results with the help of the following example. The agent has three possible actions a 0 , a 1 and a * , with a 0 (resp., a 1 ) the agent's optimal action when the state is ω 0 (resp., ω 1 ). The prior probability of ω 1 is 1/3 and the discount factor is 1/2. The per-period payoffs are in Table 1, with the first coordinate corresponding to the principal payoff. TABLE 1. Payoff table of Example 1 a 0 a 1 a * ω 0 0, 1 0, 0 1, 1/2 ω 1 0, 0 0, 2 1, 1/2 We start with few preliminary observations. First, regardless of the agent's belief, action a * is never optimal. Second, if the agent knew the state, he would choose a 0 (resp., a 1 ) in state ω 0 (resp., ω 1 ), resulting in an expected payoff of 4/3. Third, the opportunity cost of playing a * is the highest when the state is ω 1 , i.e., u(a 1 , ω 1 )−u(a * , ω 1 ) > u(a 0 , ω 0 )−u(a * , ω 0 ). It is harder to incentivize the agent to play a * when he is more confident that the state is ω 1 . As we shall see, the optimal policy exploits this property. We now consider some simple strategies the principal may commit to. To start with, assume that the principal commits to disclose information at the initial stage only. Clearly, 6 The simplest such strategy is to have the agent play a = a * in all future periods after a deviation. since a * is never optimal, the principal's payoff is 0. The principal must condition his information disclosure on the agent's actions. The simplest such disclosure policy is to "reward" the agent with full information disclosure for playing a * sufficiently often at the beginning of the relationship, say up to period T * . Note that if the agent deviates, the harshest punishment the principal can impose is to withhold all information in all subsequent periods, inducing a normalized expected payoff of 2/3. We are thus looking for the largest T * such that With such a simple strategy, T * = ln(5)/ ln(2) = 2 and the principal's payoff is 3/4. There is yet another simple strategy the principal can commit to. The principal can commit to a "recursive" policy, where he fully discloses the state with probability α at period t (and withhold all information with the complementary probability) if the agent plays a * at period t − 1. (Again if the agent deviates, the harshest punishment is to withhold all information in all subsequent periods.) Thus, if we write V (resp., U ) for the principal (resp., agent) payoff, the best recursive policy is to choose α so as to maximize The principal's best payoff is V = 4/5 with α = 1/4. The "recursive" policy does better than the policy of fully disclosing the state with delay. Intuitively, the "recursive" policy performs better because it makes it possible to incentivize the agent to play a discounted number of periods slightly larger than 2. (In continuous time, both policies would be equivalent.) As full information with delay plays an important role in the work of Ball [2019] and Orlov et al. [2019] , we will compare the recursive policy with our optimal policy later on. For now, it suffices to say that it is not optimal in Example 1. This section fully characterizes the optimal contract and discusses its most salient properties. We first start with a recursive formulation. 3.1. A recursive formulation. The first step in deriving an optimal contract is to reformulate the principal's problem as a recursive problem. To do so, we introduce two state variables. First, it is well-known that classical dynamic contracting problems admit recursive formulations if one introduces promised continuation payoffs as a state variable and imposes promise-keeping constraints, e.g., Spear and Srivastava [1987] . The second state variable we need to introduce is beliefs. We now turn to a formal reformulation of the problem. 7 We first need some additional notation. For any p ∈ ∆(Ω), we let u(a, p) := ω p(ω)u(a, ω) be the agent's expected payoff of choosing a when his belief is p, m(p) := max a∈A u(a, p) be the agent's best payoff when his belief is p, and M (p) := ω p(ω) max a∈A u(a, ω) be the agent's expected payoff if he learns the state prior to choosing an action. It is worth noting that m is a piece-wise linear convex function, that M is linear and that m(p) ≤ M (p) for all p. Similarly, we let v(a, p) be the principal's payoff when the agent chooses a and the principal's belief is p. Finally, let P := {p ∈ [0, 1] : m(p) = u(a * , p)}, be the set of beliefs at which a * is optimal. If non-empty, the set P is the closed interval [p, p] . . Throughout, we restrict attention to functions V : W → R, with the interpretation that V (p, w) is the principal's payoff if he promises a continuation payoff of w to the agent when the agent's current belief is p. The principal's maximal payoff is V * (p 0 , m(p 0 )), where V * is the unique fixed point of the contraction T , defined by (λs,(ps,ws) ,as)∈[0,1]×W×A s∈S s∈S λ s [(1 − δ)v(a s , p s ) + δV (p s , w s )], subject to: We briefly comment on the maximization program. The first constraint is an incentive constraint: the agent must have an incentive to play a s when w s is the agent's promised 7 A nearly identical reformulation has already appeared in Ely [2015] , one of the working version of Ely [2017] . We remind the reader that Ely [2017] analyzes the interaction between a long-run principal and a sequence of short-run agents. (See also Renault et al. [2017] .) While discussing the extension of his model to the interaction between a long-run principal and a long-run agent, Ely [2015] has derived a recursive reformulation nearly identical to ours. However, he didn't go further. We start from the recursive formulation and use it to derive an optimal policy. See Section A.2 for a detailed comparison of both formulations. continuation payoff and p s the agent's belief. To understand the right-hand side, observe that the agent can always play a static best-reply to any belief he has so that his expected payoff must be at least m(p s ) when his current belief is p s . 8 Conversely, if the contract specifies action a s and the agent does not execute that action, the contract can specify no further information revelation, in which case the agent's payoff is at most m(p s ). Therefore, m(p s ) is the agent's min-max payoff. The second constraint is the promise-keeping constraint: if the principal promises the continuation payoff w at a period, the contract must honor that promise in subsequent periods. The third constraint states that the policy selects a splitting of p, that is, a distribution over posteriors with expectation p. Throughout, we slightly abuse notation and denote by τ a policy, that is, a function from W to ([0, 1]×W×A) |S| . A policy is feasible if it specifies a feasible tuple ((λ s , (p s , w s ), a s )) s∈S for each (p, w), i.e., a tuple satisfying the constraints of the maximization problem T (V )(p, w). Two important observations are worth making. First, for any function V , T (V ) is a concave function in (p, w). Concavity reflects the fact that the more information the principal discloses, the harder it is to reward the agent in the future. Second, T (V ) is a decreasing function in w, that is, the more the principal promises to the agent, the harder it is to incentivize the agent to play a * . 9 We will repeatedly make use of these two properties, which we formally record in the following proposition. Proposition 1. The value function V * is concave in both arguments and decreasing in w. Proposition 1 together with the recursive formulation has a number of additional implications. First, there is at most one signal s at which the principal recommends the agent to play a * . Moreover, whenever the principal recommends a * , the agent is indifferent between obeying the recommendation or deviating. In other words, the promised continuation payoff does not leave rents to the agent. Second, if the principal does not recommend a * at a period, then the principal never recommends a * at a subsequent period, that is, the principal's continuation value is zero. In other words, as soon as an action other than a * is played, the principal stops incentivizing the agent to play a * . 8 More precisely, if the agent's belief at period τ is p τ , he obtains the payoff m(p τ ) by playing a static bestreply. Since the function m is convex and beliefs follow a martingale, his expected payoff is therefore at where F τ is the agent's filtration at period τ . 9 A real-valued function f is increasing (resp., strictly increasing) if x ≥ y implies that f (x) ≥ f (y) (resp., f (x) > f (y)). The function f is (resp., strictly) decreasing if −f is (resp., strictly) increasing. Finally, if the principal induces the posterior p s while recommending the action a s and promising the continuation payoff w s , the principal should not have an incentive to further disclose information in that period. The following proposition formally states these three implications. Proposition 2. For all (p, w), there exists a solution (λ s , p s , w s , a s ) s∈S to T (V * )(p, w) such that (i): There exists at most one signal s * ∈ S such that λ s * > 0 and a s * = a * . Moreover, (1 − δ)u(a s * , p s * ) + δw s * = m(p s * ). (ii): For all s ∈ S such that λ s > 0 and a s = a * , V * (p s , w s ) = 0. For all s ∈ S such that λ s > 0, we have Proposition 2 states key properties that an optimal policy possesses. We conclude this section with a partial converse, that is, we state properties that guarantee optimality of a policy. To do so, we need to introduce two additional notation. We first let Q 1 be the set of beliefs at which the agent has an incentive to play a * if he is promised full information disclosure at the next period, that is, If Q 1 is empty, then all policies are optimal as the principal can never incentivize the agent to play a * . If Q 1 is non-empty, then it is a closed interval [q 1 , q 1 ]. Note that q 1 = 0 if and only if a * is optimal at p = 0. For a graphical illustration, see Figure 2 . for the continuation payoff that makes the agent indifferent between playing action a * and receiving the continuation payoff w(p) in the future, and playing a best reply to the belief p forever, that is, w(p) solves: (1 − δ)u(a * , p) + δw(p) = m(p). Theorem 1. Consider any feasible policy inducing the value functionṼ . IfṼ is concave in both arguments, decreasing in w and satisfies for all p ∈ Q 1 , then the policy is optimal. Proof. We argue thatṼ is the fixed point of the operator T , henceṼ = V * . Let (λ s , p s , w s , a s ) s∈S be a solution to the maximization problem T (Ṽ )(p, w). We first start with the following observation. Consider any s such that a s = a * . We have where the last inequality follows from the fact thatṼ is decreasing in w and m(p s ) ≤ Consider now any s such that a s = a * . Since (λ s , p s , w s , a s ) s∈S is feasible, we have that hence p s ∈ Q 1 and, therefore, . The concavity ofṼ implies that (a) about concave functions in Section A.1. Combining the above two inequalities implies that It follows that where the second inequality follows from the concavity ofṼ and the third inequality fromṼ being decreasing in w. Conversely, since the policy inducingṼ is feasible, we must have that T (Ṽ )(p, w) ≥ V (p, w) for all (p, w). This completes the proof. 3.2. An optimal policy. The objective of this section is to define a policy, which we later prove to be optimal. Throughout, the number p ∈ [0, 1] refers to the probability of ω 1 . We denote a p a maximizer of u(·, p). Without loss of generality, assume that , that is, the principal's benefit of a * in state ω 0 relative to state ω 1 is higher than the agent's opportunity cost in state ω 0 relative to state ω 1 . (A symmetric argument applies if the reverse inequality holds.) Observe that if a * is optimal for the agent at p = 1, i.e., 1 ∈ P , then a * is also optimal at p = 0 and, consequently, P = [0, 1], i.e., a * is optimal at all beliefs. In what follows, we exclude this trivial case and assume that 1 / ∈ P . Define the functions λ : W → [0, 1] and ϕ : W → [0, 1], with (λ(p, w), ϕ(p, w)) the unique solution to: for all w > m(p), and λ(p, m(p)), ϕ(p, m(p))) = (1, p). Geometrically, the solution (ϕ(p, w), m(ϕ(p, w)) is the unique intersection between the line connecting (p, w) and (1, m(1)) and the graph of m. See Figure 3 for an illustration. Note that both functions are continuous. We now define a family of policies (τ q ) q∈[q 1 ,q 1 ] and show later the existence of q * ∈ [q 1 , q 1 ] such that the policy τ q * is optimal. For each q ∈ [q 1 , q 1 ], there are four regions to consider: The four regions partition the set W. The first region corresponds to the points (p, w) below the line connecting (0, m(0)) to (q 1 , m(q 1 )). The second region corresponds to the points (p, w) below the line connecting (q 1 , m(q 1 )) and (1, m(1)) but above the line connecting (q, m(q)) and (1, m(1)). The third region corresponds (p, w) below the line connecting (q, m(q)) and (1, m(1)), while the fourth region corresponds to all other points. Figure 4 illustrates the four regions with W 1 q the black region, W 2 q the region with vertical lines, W 3 q the gray region, and W 4 q the region with north west lines. It is worth observing that the regions W 1 q and W 4 q do not depend on the parameter q, while the other two do. The policy τ q differs from one region to another, as we now explain. If (p, w) ∈ W 1 q , the policy splits p into 0 and q 1 with probability q 1 −p q 1 −0 and p−0 q 1 −0 , respectively. Conditional on 0, the policy recommends a 0 and promises a continuation payoff of m(0). Conditional on q 1 , the policy recommends a * and promises a continuation payoff of w(q 1 ). If (p, w) ∈ W 2 q , the policy splits p into ϕ(p, w) and 1 with probability λ(p, w) and 1−λ(p, w), respectively. Conditional on ϕ(p, w), the policy recommends action a * and promises a continuation payoff of w(ϕ(p, w)). Conditional on 1, the policy recommends action a 1 and promises a continuation payoff of m(1). q , the policy splits p into q and 1 with probability 1−p 1−q and p−q 1−q , respectively. Conditional on 1, the policy recommends a 1 and promises a continuation payoff of m(1). Conditional on q, the policy recommends a * and promises a continuation payoff of w(q). q , the policy splits p into 0, q 1 , and 1 with probability λ 0 , λ q 1 and λ 1 , respectively. Conditional on 0 (resp., 1), the policy recommends action a 0 (resp, a 1 ) and promises a continuation payoff of m(0) (resp., m(1)). Conditional on q 1 , the policy recommends action a * and promises a continuation payoff of w(q 1 ). The probabilities (λ 0 , λ q 1 , λ 1 ) ∈ R + × R + × R + are the unique solution to: A solution exists since W 4 q is the convex hull of (0, m(0)), (q 1 , m(q 1 )) and (1, m (1)). This completes the description of the policy. Intuitively, in regions W 1 q and W 3 q , the policy generates a strictly positive information value, i.e., the policy leaves rents to the agent. Clearly, this is needed when even the promise of full information disclosure at the next period does not incentivize the agent to play a * even once, i.e., when his belief is in [0, 1] \ Q 1 . As we shall see later, this is even true for some beliefs in Q 1 as a way to minimizes the cost of incentivizing the agent relative to the benefit to the principal. In region W 2 q , the policy does not leave the rent to the agent and continuously incentivize him to play a * with the promise of future information disclosure. This is the most important region. Finally, region W 4 q is mostly for completeness. The policy enters it only at beliefs sufficiently close to q 1 . For instance, in the introductory Figure 1 , panel (D) represents the evolution of beliefs starting from a prior belief in the region W 3 q , transitioning to the region W 2 q at the next period and staying there for three periods and transitioning then to the region W 4 q . To illustrate further the policy, we revisit our running example. Example 1 (continued). We have that M (p) = 1 + p, m(p) = max(1 − p, 2p) and w(p) = 2 max(2p, 1 − p) − (1/2). Therefore, Q 1 = [1/6, 1/2]. Assume that q = 1/2 (we will soon show that this is optimal in this example). Remember that 1/3 is the prior probability of ω 1 and δ = 1/2. Let us start with the pair (p, m(p)) = (1/3, 2/3), which is in region W 2 1/2 . The policy recommends a * to the agent and promises a continuation payoff of w(1/3) = 5/6. The next state is therefore (1/3, 5/6), which is again in W 2 1/2 . If the agent had been obedient, the policy then splits the prior probability 1/3 into 3/11 and 1 with probability 22/24 and 2/24, respectively. To see this, note that we indeed have: Conditional on the posterior 3/11, the policy recommends a * to the agent and promises a continuation payoff of w(3/11) = 21/22. Conditional on the posterior 1, the policy recommends a 1 and promises a continuation payoff of m(1) = 2. Therefore, the next state is either (3/11, 21/22) or (1, 2), with the former again in W 2 1/2 . In the latter case, the policy yet again recommends a 1 and a continuation payoff of 2. In the former case, the policy splits 3/11 into 7/39 and 1, with probability 39/44 and 5/44, respectively. Conditional on the posterior 7/39, the policy recommends a * to the agent and promises a continuation payoff of w(7/39) = 89/78. Conditional on the posterior 1, the policy recommends a 1 and promises a continuation payoff of m(1) = 2. Finally, at the state (7/39, 89/78), which is in region W 4 1/2 , the policy does a penultimate split of 7/39 into 0, 1/6 and 1 with probability 113/156 18/156 and 25/156, respectively. Conditional on the posterior 1/6, the policy recommends a * and promises a continuation payoff of 7/6, i.e., full information disclosure at that the next period. Thus, the policy fully discloses the state in finite time to the agent. See Figure 5 for the evolution of the beliefs at the beginning of each period. At all beliefs other than 0 and 1, the agent is recommended to play a * . The principal's expected payoff is 1285/1536, i.e., about 0.83. FIGURE 5. Evolution of the beliefs. 3.3. Construction of q * and optimality. Let V q : W → R be the value function induced by the policy τ q . Note that for all q, V q (1, m(1)) = 0 since a * is not optimal at p = 1 is then optimal at p = 0). Therefore, any two policies τ q and τ q induce the same values (Remember that the regions W 1 q and W 4 q do not vary with q, see Figure 4 .) Similarly, any two policies τ q and τ q induce the same values at all (p, w) ∈ W 2 min(q,q ) . In particular, τ q and τ q 1 induce the same values at all We are now ready to state our main result. Let Theorem 2. The policy τ q * is optimal and, therefore, To understand the role of q * , recall that for all p ∈ [q * , 1], the policy leaves rents to the agent. 10 To minimize the rents left to the agent, we therefore would like to have q * as high as possible, i.e, equals to q 1 . However, V q 1 (·, m(·)) is not guaranteed to be concave in p, a necessary condition for optimality. To see this, consider any pair 10 I.e., the agent is promised a payoff of 1−p where the first inequality follows from the concavity of V * in both arguments and the second from V * decreasing in w and the convexity of m. The optimal choice of q * is thus the largest q, which guarantees V q (·, m(·)) to be concave. More precisely, as we show in Section A.5, the definition of q * guarantees that V q * is concave in both arguments and decreasing in w, so that V q * (·, m(·)) is a concave function of p. We also prove that V q * (p, m(p)) ≥ V q 1 (p, m(p)) for all p. Since it is clearly the smallest such function, V q * is the concavification of V q 1 . In particular, q * = q 1 if V q 1 (·, m(·)) is already concave. Figure 6 illustrates the concavification in the context of Example 1. FIGURE 6. The concavification of V q 1 (·, m(·)) in Example 1 The policy we construct leaves rents to the agent for all priors in [0, q 1 ) ∪ (q * , 1], that is, the (ex-ante) participation constraint does not bind. It is quite natural for all priors in [0, 1] \ Q 1 since the agent cannot be incentivized to play a * even once. In the language of Ely and Szydlowski [2020] , "the goalposts need to move," that is, one needs to disclose information at the ex-ante stage to persuade the agent to play a * . However, our policy also leaves rents for all priors in (q * , q 1 ]. The intuitive reason is that the initial information disclosure reduces the cost of incentivizing the agent in subsequent periods sufficiently enough to compensate for the initial loss. (When the realized posterior is 1, the agent never plays a * , thus creating the loss.) 3.4. Properties of the policy. The policy we construct has a number of noteworthy features, which we now explore. Information disclosure. The policy discloses information gradually over time, with beliefs evolving until either the agent learns the state or believes that a * is (statically) optimal. We can be more specific. if P is non-empty, and Q ∞ = ∅, otherwise. (Remember that P is the set of beliefs at which a * is statically optimal.) Note that P ⊆ Q ∞ . See Figure 7 for a graphical illustration. Intuitively, the set Q ∞ has the "fixed-point property," that is, if one starts with a prior p ∈ Q ∞ , then the belief ϕ(p, w(p)) ∈ Q ∞ . Since ϕ(p, w(p)) ≤ p (with a strict inequality if p / ∈ P ), we thus a decreasing sequence of beliefs converging to an element in P . See panel (B) of Figure 1 for an an illustration. At all priors in [0, 1] \ Q ∞ , there exists T δ < ∞ such that the belief process is absorbed in the degenerate beliefs 0 or 1 after at most T δ periods. In other words, agent learns the state for sure in finite time. The number of periods T δ corresponds to the maximal number of periods the agent can be incentivized to play a * . (We provide an explicit computation in Section A.4.) In Example 1, it is 4. Moreover, the date T δ is increasing in δ and converges to +∞ as δ converges to 1. (Note that it is also uniform in that it does At all priors in Q ∞ , the belief process is either absorbed in 1 or in some p ∈ P ; the process may be absorbed only asymptotically. The economics of our policy. We now provide some economic insights as to why our policy is optimal. From Makris and Renou [2020], we can view the principal's problem as selecting among the set of Bayes correlated equilibria of the decision problem the agent faces an equilibrium maximizing his payoff. At a Bayes correlated equilibrium, an omniscient mediator makes periodic recommendations of actions to the agent. The recommendation at period t depends on all past recommendations, past actions and payoff-relevant states. At an equilibrium, the agent has an incentive to be obedient, provided he has been in the past. Denote π t (a|a t−1 , ω) the probability of recommending action a conditional on the profiles of past recommendations (and actions) a t−1 and the state ω, and let π t−1 (a t−1 |ω) = π 1 (a 1 |ω) × π 2 (a 2 |a 1 , ω) × · · · × π t−1 (a t−1 |a t−1 , ω). 11 At a Bayes correlated equilibrium, the principal's payoff is is the discounted probability of recommending action a * and is the average discounted probability of ω 1 when a * is recommended. Notice that p * cannot be lower than q 1 since the agent would never play a * for lower beliefs. Similarly, let p be the average discounted probability of ω 1 when a * is not recommended. Since the belief process is a martingale, λ * p * + (1 − λ * )p = p 0 . We now turn our attention to the agent's expected payoff. Since the agent cannot obtain more than max a∈A u(a, ω) whenever he does not play a * , his expected payoff is bounded from above by: Moreover, since the agent's payoff must be at least m(p 0 ), there exists a positive number The number c captures two effects. First, the optimal solution may leave some rents to the agent, so that the agent's payoff is m(p 0 ) + c 1 for some c 1 ≥ 0. Second, the agent's payoff may be bounded away from the upper bound by some positive number c 2 > 0. It follows that c = c 1 + c 2 . We can then rewrite the principal's expected payoff as: The first term captures the average benefit of incentivizing the agent to play a * relative to the cost. Since v(a * ,0) v(a * ,1) ≥ m(0)−u(a * ,0) m(1)−u(a * ,1) , it is decreasing in p * . Ceteris paribus, the lower the average belief at which the agent plays a * , the higher the principal's expected payoff. In a sense, this term represents the "debt" the principal accumulates over time as the agent repeatedly plays a * . The second term captures how the principal repays his debt with his only instrument: information. The term M (p 0 ) − m(p 0 ) is the maximal value of information the principal can create. Ceteris paribus, the principal's payoff is decreasing in c, that is, the best is to leave no rents to the agent and to create as much information as necessary to repay the agent. Notice that c = 0 is only achieved by both leaving no rents to the agent and fully informing the agent of the state (i.e., whenever the agent does not play a * , his belief is either 0 or 1). In general, the principal needs to trade-off a lower p * for a lower c. It is worth noting that our policy guarantees c = 0 for all p 0 ∈ [q 1 , q * ]. The principal uses to the full extent possible the information available to him. For p 0 ∈ [0, q 1 ], it is then clear why our policy of splitting p 0 into 0(= p ) and q 1 (= p * ) is optimal. It minimizes the average cost of incentivizing the agent to play a * , attains the upper bound for the agent's expected payoff since M (0) = m(0), i.e., c 2 = 0, and leaves as little rents as possible. A similar argument applies to p 0 ∈ [q 1 , 1]. When p 0 ∈ (q * , q 1 ], the policy leaves strictly positive rents to the agent, i.e., c 1 > 0. The gain is to reduce the average cost of incentivizing a * in the future, which out-weights the cost. Non-uniqueness and comparison with the KG's policy. The policy is not always uniquely optimal. We demonstrate the non-uniqueness with the help of a simple example and then discuss how our policy compares with the KG's policy (for Kamenica-Gentzkow's policy). Example 2. The agent has two possible actions a 0 and a 1 , with a 0 (resp., a 1 ) the agent's optimal action when the state is ω 0 (resp., ω 1 ). The principal wants to induce a 0 as often as possible, i.e., a * = a 0 . The discount factor is 1/2. The payoffs are in Table 2 , with the first coordinate corresponding to the principal's payoff. In Example 2, we have that: m(p) = max(1 − p, p), M (p) = 1 and u(a * , p) = 1 − p. Thus, a * is optimal for all p ∈ P = [0, 1/2]. Moreover, Q 1 = [0, 2/3] and w(p) = 3p − 1 for p ∈ (1/2, 2/3]. a 0 a 1 ω 0 1, 1 0, 0 ω 1 1, 0 0, 1 We now provide an explicit characterization of the value function. We first compute the value function V q 1 (·, m(·)) and check whether it is concave. For p ∈ [0, 1/2], the policy recommends a * and promises a continuation payoff of m(p). That is, since a * is optimal, the principal does not need to incentivize the agent. For p ∈ (1/2, 2/3], the policy recommends a * and promises a continuation payoff of w(p). At (p, w(p)) with p ∈ (1/2, 2/3], the policy splits p into ϕ(p, w(p)) and 1, with probability λ(p, w(p)) and 1 − λ(p, w(p)) respectively. (See Equation (1).) We obtain that λ(p, w(p)) = (3−4p) and ϕ(p, w(p)) = 2−3p 3−4p . Note that ϕ(p, w(p)) = 2−3p 3−4p < 1 2 since p ∈ (1/2, 2/3]. After splitting p into ϕ(p, w(p)), the principal therefore obtains a payoff of 1 in all subsequent periods. It follows that the principal's expected payoff is Finally, if p ∈ (2/3, 1], the policy splits p into 2/3 and 1 with probability 3(1 − p) and (1 − 3(1 − p)), respectively. The principal's expected payoff is then So, the value function V q 1 induced by the policy τ q 1 is such V q 1 (p, m(p)) = 1 for all p ∈ [0, 1/2] and V q 1 (p, m(p)) = 2(1 − p) for all p ∈ (1/2, 1]. Since it is concave in p, this guarantees that q * = q 1 and, thus, the policy is indeed optimal. We now consider another policy, which we call the KG's policy. The aim of the KG's policy is to persuade the agent to choose a * as often as possible by disclosing information at the initial stage only. In other words, the KG's policy uses information disclosure neither to reward nor to punish the agent, but only to persuade. The best payoff the principal can obtain with a KG's policy is: In Example 2, the KG's policy differs from our policy only when p ≥ 1/2, and consists in splitting p into 1/2 and 1, with probability 2(1 − p) and 1 − 2(1 − p) respectively. The KG's policy induces the same value function as ours policy, hence is also optimal. We now prove that this is not accidental. Suppose that there are only two actions, a 0 and a 1 , such that a 0 (resp., a 1 ) is optimal at state ω 0 (resp., ω 1 ). The principal aims at implementing a 0 as often as possible, i.e., a * = a 0 . 12 Remember that a 0 is optimal at all beliefs in [p, p] . Since a 0 is optimal at 0, p = 0. To streamline the exposition, assume that the prior p 0 > p. (If p 0 ≤ p, an optimal policy is to never reveal any information.) It is then immediate to see that the KG's policy consists in splitting the prior p 0 into p and 1, with probability 1−p 0 1−p and 1 − 1−p 0 1−p , respectively. Intuitively, the principal designs a binary experiment, with one signal perfectly informing the agent that the state is ω 1 and the other partially informing the agent so that his posterior beliefs is p. We can contrast the KG's policy with our policy. Unlike the KG's policy, our policy does not reveal information to the agent at the first period, and only reveals information to the agent if he plays a 0 . If the agent plays a 0 at the first period, the policy splits p 0 into ϕ(p 0 , w(p 0 )) and 1 with probability λ(p 0 , w(p 0 )) and 1 − λ(p 0 , w(p 0 )), respectively. Note that ϕ(p 0 , w(p 0 )) ≤ p since w(p 0 ) ≥ m(p 0 ). Thus, our policy guarantees that the agent plays a * for sure at the first period. However, this comes at a cost: the principal needs to reveal more information to the agent at the next period and, consequently, inducing the agent to play a 0 with a lower probability. Somewhat surprisingly, both policies are optimal, regardless of the discount factor. Corollary 1. If there are only two actions, then the KG's policy is also optimal. As Example 1 shows, the KG's policy is not always optimal. Yet, we might wonder whether this was due to a * being strictly dominated. The answer is negative, as the next example shows. Example 3. The agent has three possible actions a 0 , a 1 and a * , with a 0 (resp., a 1 ) the agent's optimal action when the state is ω 0 (resp., ω 1 ). The prior probability of ω 1 is 1/12 and the discount factor is 1/2. The payoffs are in Table 3 , with the first coordinate corresponding to the principal's payoff. In Example 3, we have that: M (p) = 1 + p, m(p) = max(1 − p, 3/4, 2p), [p, p] = [1/4, 3/8] and q 1 = 1/12. It is straightforward to show that the KG policy consists in splitting the a 0 a 1 a * ω 0 0, 1 0, 0 1, 3/4 ω 1 0, 0 0, 2 1, 3/4 prior 1/12 into 0 and 1/4, inducing a payoff of 1/3. 13 Note that the KG's policy does not create a strictly positive value of information, since (3/4)m(0) + (1/4)m(1/4) = m(1/12), while ours does. Yet, our policy is optimal and induces a payoff of 1/2 (= (1 − δ)v(a * , q 1 )). Comparison with the "recursive" policy. Finally, we compare our policy with the "recursive" policy. Remember that the "recursive" policy is essentially equivalent to the policy of fully disclosing the state with delay, a policy which plays a prominent role in the work of Ball [2019] and Orlov et al. [2019] . We first compute the principal's best payoff if he commits to the best "recursive" policy, that is, when the principal promises to fully disclose the state with probability α at period t + 1 (and to withhold all information with the complementary probability) if the agent plays a * at period t. To ease the exposition, we assume that a * is not optimal at the belief p = 0. 14 Assume that p ∈ Q 1 . The best recursive policy is thus solution to the maximization problem: inducing the value The formula has a natural interpretation. Whenever the agent is recommended to play a * , no information has been revealed yet, so that the maximal value of information the principal can create is M (p) − m(p). To incentivize the agent, the principal needs to 13 As we vary the prior, the induced value function is 4p when p < 1/4, 1 when p ∈ [1/4, 3/8] and 8(1−p) 5 when p ∈ (3/8, 1). It is concave in p. 14 When a * is optimal at p = 0, we need to add the term δα(1−p)v(a * , p) to the objective, which corresponds to the payoff the principal obtains when the disclosed state is ω 0 . promise a continuation payoff of w(p) in the future and thus needs to create an information value of w(p) − m(p). However, to create an information value of w(p) − m(p), the principal commits to fully disclose the state with some probability, hence foregoing the opportunity to incentivize the agent to play a * in the future. Therefore, the highest probability with which the principal can incentivize the agent to play a * is ). Yet, as we now explain, the principal can do better by exploiting the fact that it is easier to incentivize the agent to play a * at some beliefs than at others. To see how the principal can do better, we study the relaxed version of our problem, where only the (ex-ante) participation constraint needs to be satisfied. Consider the following policy. The principal discloses information at the ex-ante stage, i.e., chooses a splitting (λ s , p s ) s of p, and recommends the agent to play a * at all periods with probability α s when the realized signal is s. We continue to assume that p ∈ Q 1 . The policy satisfies We can rewrite the participation constraint as: where m(p s ) − u(a * , p s ) is the opportunity cost of following the recommendation at belief p s . The principal aims at maximizing s λ s α s v(a * , p s ). Clearly, the participation constraint binds at a maximum. Moreover, since m is convex, the best for the principal is to fully disclose all information at the ex-ante stage. It only remains to characterize the optimal α s . Note that if the principal recommends a * with the same probability at all s, which is precisely the payoff of the recursive policy. 15 To see this, note that if the principal recommends a * with the same probability α at all s, it is as if the agent plays a * with probability α, regardless of the state, and plays a 0 (resp., a 1 ) with probability (1 − α)(1 − p) (resp.,, (1 − α)p). Since α binds the participation constraint (2), we obtain the same payoff as the recursive policy. 15 When a * is optimal at p = 0, we need to add the term (1 − p) 1 − M (p)−m(p) M (p)−u(a * ,p) v(a * , p). However, the principal can do better by exploiting the difference in opportunity costs at the two extreme beliefs 0 and 1. Writing α 1 (resp., α 0 ) for the probability of recommending a * conditional on the posterior being 1 (resp., 0), the principal maximizes pα 1 v(a * , 1) + (1 − p)α 0 v(a * , 0) subject to: The right-hand side is the maximal value of information the principal can create, while the left-hand side is the expected opportunity cost of following the recommendation. As with the recursive policy, the principal needs to generate the maximal value of information; this is the maximal value the principal can use to incentivize the agent. However, unlike the recursive policy, the principal needs to use the surplus created asymmetrically, as it is easier to incentivize the agent in state ω 0 than ω 1 . More precisely, the problem is linear in (α 0 , α 1 ). Therefore, since the slope v(a * ,0) v(a * ,1) is larger than the slope m(0)−u(a * ,0) m(1)−u(a * ,1) , the optimal solution is to set α 0 as high as possible. with a strict inequality if the opportunity cost is strictly higher in state ω 1 . 16 This is the solution to the relaxed constraint. While our policy also needs to incentivize the agent to follow the recommendation, it exploits the same asymmetries in opportunity costs as the above policy, which explains why it outperforms the recursive policy. To conclude, note that if v(a * ,0) v(a * ,1) = m(0)−u(a * ,0) m(1)−u(a * ,1) , then the recursive policy solves the relaxed problem and, therefore, is also optimal. A.1. Mathematical preliminaries. We collect without proofs some useful results about concave functions. Let f : [a, b] → R be a concave function and a ≤ x < y < z ≤ b. The following properties hold: Note that property (a) implies that for all ∆ ≥ 0 such that y + ∆ ≤ b. (This is true irrespective of whether x + ∆ y.) We will repeatedly use these properties in most of the following proofs. To prove Lemma 5, we will use the following property: x−a Ely [2015] proves that the principal's maximal payoff is max w∈[m(p 0 ),M (p 0 )]V * (p 0 , w), witĥ V * the unique fixed point of the contractionT , withT differing from T in that the promised-keeping constraint is in equality; all other constraints are the same. Note that the operatorT is also monotone. We now prove that both formulations are equivalent. Clearly, we have that T (V )(p, w) ≥ where the first inequality follows from V * decreasing in w. Conversely, let (λ * s , p * s , w * s , a * s ) s∈S be a maximizer of T (V * )(p 0 , m(p 0 )). We have that hence (λ * s , p * s , w * s , a * s ) s∈S is a maximizer for T (V * )(p 0 , w 0 ) and, consequently, A.3. Proposition 2. We break Proposition 2 into several lemmata. Lemma 1. Let (λ s , p s , w s , a s ) s∈S be a solution to the maximization program T (V * )(p, w). For all s ∈ S such that λ s > 0, we have Proof. By contradiction, assume that there exists s ∈ S such that λ s > 0 and Let (λ * s , p * s , w * s , a * s ) s∈S be the policy, which achieves V * (p s , (1 − δ)u(a s , p s ) + δw s ) and consider the new policy By construction, the new policy is feasible. Moreover, we have that a contradiction with the optimality of (λ s , p s , w s , a s ) s∈S . Since the fixed point satisfies V * (p s , (1 − δ)u(a s , p s ) + δw s ) ≥ (1 − δ)v(a s , p s ) + δV * (p s , w s ), we have the desired result. Lemma 2. Let (λ s , p s , w s , a s ) s∈S be a solution to the maximization program T (V * )(p, w). Proof. Let s ∈ S such that λ s > 0 and a s = a * . We have where the first inequality follows from Lemma 1 and the second follows from V * decreasing in w and w s ≥ u(a s , p s ) for (1 − δ)u(a s , p s ) + δw s ≥ m(p s ), to hold. It follows that V * (p s , w s ) = 0. Lemma 3. Let (λ s , p s , w s , a s ) s∈S be a solution to the maximization program T (V * )(p, w). There exists another solution (λ s , p s , w s , a s ) s∈S such that a s = a * for at most one s ∈ S with λ s > 0. Proof. Let (λ s , p s , w s , a s ) s∈S be a solution to the maximization program T (V * )(p, w). Let S * ⊆ S be the set of signals such that a s = a * and λ s > 0. If S * is empty, there is nothing to prove. If S * is non-empty, define p * as and s∈S * λ s = λ * . From the concavity of V * , we have that It routine to verify that the new contract ((λ s , p s , w s , a s ) s∈S \S * , (λ * , p * , a * , w * )) is feasible and, therefore, also optimal. Lemma 4. Let (λ s , p s , w s , a s ) s∈S be a solution to the maximization program T (V * )(p, w). There exists another solution (λ s , p s , w s , a s ) s∈S such that for all s such that λ s > 0 and a s = a * . Proof. Assume that there exists s * such that λ s * > 0, a s * = a * and for some ε > 0. From Lemma 3, we may assume that there is a single such s * . Consider the new tuple (λ s , p s , w s , a s ) s∈S , where w s * = w s * − ε, ws = w s + λ s * λs ε for somes = s * such that λs > 0, w s = w s for all s ∈ S \ {s * ,s}, and (λ s , p s , a s ) = (λ s , p s , a s ) for all s. This new contract is feasible and increases the principal's payoff. A.4. Value functions. This section proves that the policy τ q induces a well-defined value function V q . As explained in the text, if the value function V q 1 is well-defined, so are all value functions V q . We first start with the definition of important subsets of A.4.1. Construction of the sets Q k . Let Q 0 := [0, 1]. We define inductively the set Q k ⊆ [0, 1], k ≥ 0. We write q k (resp., q k ) for inf Q k (resp., sup Q k ). For any k ≥ 0, define the function U k : [q k , 1] → R: for all k. We define Q k+1 as follows: For a graphical illustration, see Figure 8 . FIGURE 8. Construction of the thresholds Few observations are worth making. First, we have that P ⊆ Q k for all k. Second, we have a decreasing sequences, i.e., Q k+1 ⊆ Q k for all k. Third, if Q k and P are non-empty, then they are closed intervals. Fourth, the limit Q ∞ = lim k→∞ Q k = k Q k exists and includes P . Moreover, if P = ∅, then q ∞ = p, where p := inf P . If P = ∅, then Q ∞ = ∅. Consequently, there exists k * < ∞ such that ∅ = Q k * +1 ⊂ Q k * = ∅. The first to the third observations are readily proved, so we concentrate on the proof of the fourth observation. The limit exists as we have a decreasing sequence of sets. We prove that if P = ∅, then Q ∞ = ∅. We first argue that if Q k = Q k−1 = ∅ for some k ≥ 0, hence Q k = Q k−1 for all k ≥ k, then P = ∅. From the convexity and continuity of m and the linearity of u, Q k−1 is the closed interval [q k−1 , q k−1 ], with both boundary points solution to (1 − δ)u(a * , q) + δU k−2 (q) = m(q). Therefore, if (q k , q k ) = (q k−1 , q k−1 ), we have that: This implies that u(a * , q k−1 ) = m(q k−1 ) and u(a * , q k−1 ) = m(q k−1 ) and, therefore, ∅ = Q k−1 ⊆ P , a contradiction. Therefore, we must have an infinite sequence of strictly decreasing non-empty closed intervals. Let ε := min p∈P m(p) − u(a * , p). Since P = ∅, ε > 0 and for all p ∈ Q ∞ , for all k, It follows that m(q k ) ≥ m(q ∞ ) + ε(1 − δ)/δ for all k and, therefore, m(q ∞ ) ≥ m(q ∞ ) + ε(1 − δ)/δ, a contradiction. We now prove that if P = ∅, then q ∞ = p. From above, we have that if Q k = Q k−1 = ∅ for some k ≥ 0, hence Q k = Q k−1 for all k ≥ k, then P = Q k since P ⊆ Q k . If we have an infinite sequence of strictly decreasing sets, for all q ∈ Q ∞ , Taking the limit q ↓ q ∞ , we obtain that u(a * , q ∞ ) = m(q ∞ ), i.e., q ∞ ∈ P . Hence, q ∞ = p. A.4.2. Value functions. We first argue that V q 1 is well-defined at all (p, w) ∈ W \ W 2 q 1 . To start with, V q 1 (1, m(1)) = 0 since a * is not optimal at p = 1. Similarly, V q 1 (0, m(0)) = 0 if a * is not optimal at p = 0, while V q 1 (0, m(0) = v(a * , 0) if a * is optimal at p = 0. Also, V q 1 (q 1 , m(q 1 )) = (1 − δ)v(a * , q 1 ) if q 1 > 0; while V q 1 (0, m(0)) = v(a * , 0) if q 1 = 0, since a * is then optimal at p = 0. With the function V q 1 defined at these three points, it is then defined at all points (p, w) in W 1 q 1 ∪ W 4 q 1 . In particular, it is easy to show that At all points (p, w) ∈ W 3 q 2 , At all points (p, w) ∈ W 2 q 1 , V q 1 (p, w) is defined via the recursive equation: , m(ϕ(p, w))). Since V q 1 (p, w) = λ(p, w)V q 1 (ϕ(p, w), m(ϕ(p, w)), the value function is well-defined at all (p, w) if it is well-defined at all (p, m(p)), which we now prove. and, therefore, ϕ(p, w(p)) ∈ [q k−1 , q k ) ⊂ Q k−1 \ Q k . Moreover, ϕ(q k , w(q k )) = q k . We now use these observations to prove that V q 1 is well-defined. For all p ∈ Q 1 \ Q 2 , we have that w(p) ∈ Q 1 \ Q 0 , so that (p, w(p)) ∈ W 4 q 1 . Since V q 1 (p, m(p)) = (1 − δ)v(a * , p) + δV q 1 (p, w(p)), V q 1 (p, m(p)) is well-defined for all p ∈ Q 1 \ Q 2 . By induction, assume that it is welldefined for all p ∈ m q (p ). Since Therefore, λ(p, w) = 1−p 1−p λ(p , w ) and ϕ(p , w ) = ϕ(p, w) since the solution is unique when w > m q (p ). Finally, if m q (p ) = 1−p 1−p m q (p) + p −p 1−p m q (1), the result follows from continuity as: Note that this implies that where c is a positive constant. OBSERVATION B. The value function V q 1 (p, ·) : [m q 1 (p), M (p)] → R is concave in w, for each p. See Lemma 5 in section A.6. A.5.1. Proposition 3(a). We prove that V q * decreasing in w. To start with, fix p ∈ [0, 1] First, assume that p ≤ q * . If w = m q * (p), then V q * (p, w ) ≤ V q * (p, w) by construction of q * . If w > m q * (p), we have that where the inequality follows from the concavity of V q 1 with respect to w, for all w ≥ m q 1 (p). (Recall that m q * (p) = m q 1 (p) for all p ≤ q * .) Second, assume that p > q * . We show in great details how to make use of Observation A to deduce the result. We repeatedly use similar computations later on. We have where the first line follows from the construction of V q * , the second line from Observation A, the third line from the definition of the functions λ and ϕ and the last line from the following computations: Thus, we are able to express V q * (p, w ) as λ(p, w)V q * (ϕ(p, w),w), withw the above expression. Moreover, ϕ(p, w) ≤ q * as w ≥ m q * (p). We can use the (already established) concavity of V q * in w for each p ≤ q * to deduce the desired result. More precisely, we have that: where the inequality follows from the concavity of V q * in w at all p ≤ q * . Lastly, since V q * (p, w) = V q * (p, m q * (p)) for all w ∈ [m(p), m q * (p)], the result immediately follows for all (w, w ), with w ∈ [m(p), m q * (p)]. A.5.2. Proposition 3(b). We prove the concavity of V q * with respect to both arguments (p, w). Without loss of generality, assume that p ≤ p . We have that: where the inequality follows from the concavity of V q 1 with respect to w for each p and the property that V q * (p, w) = V q 1 (p, w) for all (p, w) such that w ≥ m q * (p). Notice that we use twice Observation A. Finally, for all (p, w) ∈ W, for all (p , w ) ∈ W and for all α, we have that: since α max(w, m q * (p)) + (1 − α) max(w, m q * (p )) ≥ w α and the fact that V q * is decreasing in w for all p. This completes the proof of concavity. A.5.3. Proposition 3 (c). We prove that V q * (p, m(p)) ≥ (1 − δ)v(a * , p) + δV q * (p, w(p)) for all The statement is true for all p ≤ q * by definition. For p > q * , we have that hence V q * (p, w(p)) = V q 1 (p, w(p)). Since V q 1 (p, m(p)) = (1 − δ)v(a * , p) + δV q 1 (p, w(p)) for all p ∈ Q 1 and V q * (p, m(p)) = V q * (p, m q * (p)) = V q 1 (p, m q * (p)), it is enough to prove that V q 1 (p, m q * (p)) ≥ V q 1 (p, m(p)). Clearly, there is nothing prove if m q * (p) = m(p) for all p ∈ Q 1 , i.e., if q * = q 1 (remember that m q 1 (p) = m(p) for all p ∈ Q 1 ). So, assume that m q * (p) > m(p) for some p ∈ (q * , q 1 ), hence m q * (p) > m(p) for all p ∈ (q * , q 1 ). We now argue that if V q 1 (p, w) > V q 1 (p, m(p)) for some w ≥ m q * (p), then for all p > p. To see this, observe that w > m(p) and, accordingly, where the equality follows Observation A and the inequality from the concavity of V q 1 in w for each p. Since we have the desired result. Finally, from the definition of q * , for all n > 0, there exist p n ∈ (q * , min(q * + 1 n , q 1 )] and w n ≥ m(p n ) such that V q 1 (p n , m(p n )) < V q 1 (p n , w n ). From the concavity of V q 1 in w for all p, V q 1 (p n , m(p n )) < V q 1 (p n , m q * (p n )) for all n. From the above argument, for all p, for all n sufficiently large, i.e., such that p n < p, we have that Taking the limit as n → ∞, we obtain that which completes the proof. A.6. Concavity of V q 1 with respect to w for each p. Lemma 5. The value function V q 1 (p, ·) : [m q 1 (p), M (p)] → R is concave in w, for each p. This section proves that V q 1 is concave in w for each p. To do so, we prove that for all (η, η ) such that η ≥ η. (See the observations on concave functions.) We start with some preliminary results. A.6.1. Preliminary Results. We study how the function ϕ(p, w(p)) varies with p. Lemma 6. There exists a non-empty interval [q, q] such that: (1) For any p < p ≤ q or p > p ≥q, ϕ(p, w(p)) ≥ ϕ(p , w(p )), (2) The ratio m(1)−m(ϕ(p,w(p)) is constant for all p ∈ [q, q]. Proof of Lemma 6. Observe that Therefore, the convexity of m implies that if m(1)−w(p) Consider the function h : [0, 1] → R, defined by h(p) = m(1)−w(p) 1−p . We argue that h is quasi-concave. For all (p, p ) and α ∈ [0, 1], we have that where the first inequality follows form the convexity of w. (Note that the inequality is It follows that if h(p ) ≥ h(p), then it is also true for all p ∈ (p, p ). Since h is quasiconcave and continuous, the set of maxima is a non-empty convex set [q, q] , and the function is increasing for all p < q and decreasing for all p > q. (Note that m(1) − w(1) = (1−δ)(u(a * ,1)−m(1)) δ < 0, hence the function is equal to −∞ at p = 1.) We can make few additional observations about the interval [q, q]. Let k * := sup{k : Q k = ∅}. Since ϕ(q k , w(q k )) = q k , the function h is decreasing for all p ≥ q k * . Similarly, since ϕ(q k , w(q k )) = q k−1 , the function h is increasing for all p ≤ q k * . Therefore, [q, q] ⊂ Q k * . If P = ∅, so that k * = ∞, then for all p ∈ P , the function h is increasing by convexity of m since w(p) = m(p). (This is clearly true since ϕ(p, m(p)) = p in that region.) Therefore, Finally, letp := inf{p : m(p) = u(a 1 , p)}. We have that q η and, therefore, we have that ϕ(ϕ η , w(ϕ η )) ≤ ϕ(ϕ η , w(ϕ η )) since ϕ η ≤ ϕ η ≤ p ≤ q. Similarly, since ϕ η < p ≤ q, we have that ϕ(ϕ η , w(ϕ η )) ≤ ϕ(p, w(p)) and, therefore, η−(1−δ)(1−λη) δ > 0. We now return to the computation of the gradient. We have: (3) We further develop the above expression. To ease notation, we write (ϕ(p), λ(p)) for (ϕ(p, w(p)), λ(p, w(p))). Note that ϕ(p) ∈ (q k−1 , q k ], since p ∈ (q k , q k+1 ]. As where we use Observation A and the induction step. We now show that the gradient is increasing in η. To start with, note that η−(1−δ)(1−λη) δ is increasing in η since 1−λη η is decreasing in η (see Lemma 7). For any η > η , we have the where the inequality follows from the fact that ϕ(p) ∈ (q k−1 , q k ] and, therefore, the gradient G(ϕ(p); η) being increasing in η by the induction hypothesis. Finally, we have that The last inequality follows from the fact that the gradient in the second bracket is weakly larger than v(a * , 1) by the induction hypothesis and the fact that 1−λη η < 1−λ η η (Lemma 7). Since lim k→∞ q k = p when P = ∅, this completes the proof that the gradient is greater than v(a * , 1) for all p ∈ [0, p]. A.6.4. For all p ∈ I 2 , G(p; η) is increasing in η. We first treat the case P = ∅. Recall that for all p ∈ (p, q ∞ ], we have an explicit definition of the value function V q 1 (p, m q 1 (p)) as: v(a * , p) − m q 1 (p) − u(a * , p) m q 1 (1) − u(a * , 1) v(a * , 1). Defineη(p) as the solution to ϕη (p) = ϕ(p, w(p;η(p))) = p. Note that for any p ∈ (p, q ∞ ], for any η ≤η, ϕ η ∈ [p, q ∞ ]. Therefore, It follows that the gradient is equal to v(a * , 1) for all p ∈ (p, p * ], for all η ≤η. Consider now η >η. We rewrite the gradient G(p; η) as follows: V q 1 (p, m q 1 (p)) − V q 1 (p, w(p; η)) η = V q 1 (p, m q 1 (p)) − V q 1 (p, w(p; η 1 (p))) + V q 1 (p, w(p; η 1 (p))) − V q 1 (p, w(p; η)) η = η 1 (p) η V q 1 (p, m q 1 (p)) − V q 1 (p, w(p, η 1 (p))) η 1 (p) + η − η 1 (p) η V q 1 (p, w(p; η 1 (p))) − V q 1 (p, w(p; η)) η − η 1 (p) = η 1 (p) η v(a * , 1) + η − η 1 (p) η 1−p 1−p V q 1 (p, m q 1 (p)) − V q 1 p, w p; η−η 1 (p) 1−p 1−p η − η 1 (p) = η 1 (p) η v(a * , 1) + η − η 1 (p) η G p; η − η 1 (p) 1−p 1−p Since we have already shown that G(p; η) is increasing in η and weakly larger than v(a * , 1), we have that the gradient G(p; η) is also weakly increasing in η (and greater than v(a * , 1)). We now treat the case P = ∅. Defineη(p) as the solution to ϕη (p) = ϕ(p, w(p;η(p))) = q. Note that for any p ∈ [q, q], for any η ≤η, ϕ η ∈ [q, q]. Therefore, for all η ≤η, η = (1 − δ)(1 − λ η ) since the ratio m q 1 (1)−w(ϕη) 1−ϕη is constant in η and so is ϕ(ϕ η , w(ϕ η )). (Recall that we vary η at a fixed p.) It follows then from Equation (3) that G(p; η) = (1 − δ) η (1 − λ η )v(a * , 1) + δ η V q 1 (p, w(p)) − V q 1 p, w(p) + η − (1 − δ)(1 − λ η ) δ [m(1) − u(a * , 1)] , (1 − δ) η (1 − λ η )v(a * , 1) = v(a * , 1). We have that the gradient G(p; η) is equal to v(a * , 1) for all p ∈ (q, q], for all η ≤η. Finally, when η >η, the same decomposition as in the case P = ∅ completes the proof. A.6.5. For all p ∈ I 3 , the gradient G(p; η) is increasing in η. We only treat the case P = ∅. (The case P = ∅ is treated analogously.) Defineη(p) as the solution to ϕη (p) = ϕ(p, w(p;η(p))) = q ∞ . By construction, for all p ∈ (q ∞ , 1], for all η ≤η(p), we have that ϕ η ∈ (q ∞ , 1]. Therefore, ϕ η > q. Chooseη(p) ≤ η ≤ η. We have that ϕ η ≥ ϕ η ≥ q since q ∞ ≥ q and, therefore, Also, since q ≤ ϕ η ≤ p, we have that ϕ(ϕ η , w(ϕ η )) ≥ ϕ(p, w(p)) and, therefore, η−(1−δ)(1−λη) δ ≤ 0. The same applies to η . Finally, as already shown, To ease notations, define (λ η ,φ η ) as follows: Notice thatφ η = ϕ(ϕ η , w(ϕ η )) ∈ I 1 since ϕ η > q ∞ . The rest of the proof is purely algebraic and mirrors the case p ∈ I 1 . First, we have the following: where we again use Observation A. Similarly, we have: V q 1 (p, w(p)) − V q 1 p, w(p) − (1−δ)(1−λ η )−η δ m q 1 (1) − u(a * , 1) where again we use Observation A and the fact Sinceφ η ∈ I 1 , we have that: where the inequality follows from our previous argument on the interval I 1 . It follows that: V q 1 (p, w(p)) − V q 1 p, w(p) − (1−δ)(1−λ η )−η δ m q 1 (1) − u(a * , 1) (1−δ)(1−λ η )−η δ V q 1 (p, w(p)) − V q 1 p, w(p) − (1−δ)(1−λη )−η δ m q 1 (1) − u(a * , 1) (1−δ)(1−λη )−η δ From Equation (3), we then have that 1 η V q 1 (p, m q 1 (p)) − V q 1 (p, w(p; η)) = (1 − δ) (1 − λη) η v(a * , 1) + (1 − δ) (1 − λη) η − 1 V q 1 (p, w(p)) − V q 1 p, w(p) − (1−δ)(1−λη )−η δ [m(1) − u(a * , 1)] (1−δ)(1−λη )−η δ where the last inequality follows from: (1−δ)(1−λ η )−η δ =λ η V q 1 (φ η , m q 1 (φ η )) −λ η V q 1 φ η , w φ η ; (1−δ)(1−λ η )−η δλ η (1−δ)(1−λ η )−η δ v(a * , 1). We now show that the the gradient G(p; η) is smaller than v(a * , 1) for any η ≤η(p). From Equation (3), we have that: 1 η V q 1 (p, m q 1 (p)) − V q 1 (p, w(p; η))  λ η V q 1 (φη, m q 1 (φη)) −ληV q 1 φη, w φη; (1−δ)(1−λη )−η δλη (1−δ)(1−λη )−η δ − v(a * , 1) V q 1 (φη, m q 1 (φη)) − V q 1 φη, w φη; (1−δ)(1−λη )−η δλη where the inequality follows from the fact thatφ η ≤ p and, therefore from our arguments on the interval I 1 (where we show that the gradient is larger than v(a * , 1)). Finally, we can use a similar decomposition as in the case p ∈ I 2 to prove that the gradient is increasing for all η. A.7. Proof of Corollary 1. We first compute the principal's payoff induced by our policy. To ease notation, we write ϕ for ϕ(p, w(p)). We first assume that q * = q 1 , compute the value function V q 1 (p, m(p)) for all p and check that it is concave. By construction, the principal's payoff satisfies: V q 1 (p, m(p)) = (1 − δ) + δV q 1 (p, w(p)) = (1 − δ)v(a * , p) + δ 1 − p 1 − ϕ (v(a * , ϕ). Remember that Since w(p) = m(p) = u(a 0 , p) when p ≤ p, we have that ϕ = p and, therefore, the principal payoff is 1 when p ≤ p. Assume that p > p. We have that: since m(ϕ) = u(a 0 , ϕ) and ϕ ≤ p. (To see this, if ϕ > p, then m(ϕ) = u(a 1 , ϕ), hence w(p) = m(p), a contradiction with w(p) > m(p) when p > p.) The above equation is equivalent to: (1 − ϕ)[u(a 1 , p) − (1 − δ)u(a 0 , p)] = δ[(1 − p)u(a 0 , ϕ) + (p − ϕ)u(a 1 , 1)]. Observing that u(a, p) = (1 − p)(u(a, 0) − u(a, 1)) + u(a, 1) for all a and, similarly, for ϕ, we can simplify the above expression to δ 1 − p 1 − ϕ = δ − p + (1 − p) u(a 0 , 0) − u(a 1 , 0) u(a 1 , 1) − u(a 0 , 1) . Lastly, remember that the threshold p is solution to: 1 − p = u(a 1 , 1) − u(a 0 , 1) u(a 0 , 0) − u(a 0 , 1) + u(a 1 , 1) − u (a 1 , 0) , and, therefore, V q 1 (p, m(p)) = v(a * , p) + δ 1 − 1 − p 1 − ϕ v(a * , 1) Repeated games with incomplete information Dynamic information provision: Rewarding the past and guiding the future. Available at SSRN 3103127 Strategic information transmission Moving the goalposts Training and effort dynamics in apprenticeship Relational knowledge transfers Bayesian persuasion and information design Bayesian persuasion Information design in multi-stage games Persuading the principal to wait. forthcoming Optimal dynamic information provision Dynamic evaluation design On repeated moral hazard with discounting Since the KG's policy induces the same payoff, it is also optimal.A.8. First best. This section provide details on the solution to the first-best problem. 1) ) .Note that α * 1 ≤ 1, with equality if m(p) = u(a * , p)), and α *At an optimum, the participation constraint clearly binds. Ifp(m(1)−u(a * ,1)) ). Assume that m(0) − u(a * , 0) > 0. We can rewrite the principal's objective as a function of α 1 :Note that the objective is continuous in α 1 . The optimal payoff is therefore: (1−p)(m(0)−u(a * ,0)) ≤ 1 and (α 0 , α 1 ) = (1, α * 1 ), otherwise.