key: cord-0557243-gpn88lt3 authors: Vora, Anuj S.; Kulkarni, Ankur A. title: Optimal Questionnaires for Screening of Strategic Agents date: 2020-10-28 journal: nan DOI: nan sha: 63053d924dba7887f4c128cacefd04fc6ab0ebb7 doc_id: 557243 cord_uid: gpn88lt3 During the COVID-$19$ pandemic the health authorities at airports and train stations try to screen and identify the travellers possibly exposed to the virus. However, many individuals avoid getting tested and hence may misreport their travel history. This is a challenge for the health authorities who wish to ascertain the truly susceptible cases in spite of this strategic misreporting. We investigate the problem of questioning travellers to classify them for further testing when the travellers are strategic or are unwilling to reveal their travel histories. We show there are fundamental limits to how many travel histories the health authorities can recover.% can be correctly classified by any probing mechanism. [This is a longer version of our conference paper submitted to ICASSP 2021.] During the COVID-19 pandemic the health authorities at airports and train stations try to screen and identify the travellers possibly exposed to the virus. However, many individuals avoid getting tested and hence may misreport their travel history. This is a challenge for the health authorities who wish to ascertain the truly susceptible cases in spite of this strategic misreporting. We investigate the problem of questioning travellers to classify them for further testing when the travellers are strategic or are unwilling to reveal their travel histories. We show there are fundamental limits to how many travel histories the health authorities can recover. Index Termsgame theory, information extraction, screening The COVID-19 pandemic has demonstrated the need to screen and test travellers at airports and railway stations. However, due to limited resources, it is not feasible for the authorities to test all the travellers. Thus, it is imperative to identify the most susceptible travellers for testing by deriving information about their travel history and determining whether they have been to a COVID-19 hotspot. However, the travellers may not be willing to divulge their travel history to the relevant authorities. The question is then how to extract the true information from the travellers? In particular -given the tendency of the travellers to misreport their travel history, how should the health authorities query in order to screen and correctly identify maximum number of susceptible travellers? The travellers have a predisposition to misrepresent their travel information. This could be due to the associated stigma of the disease as well as the inconvenience caused due to the testing and quarantine protocols. These tendencies are different among each of the travellers. The health authorities thus face a wide spectrum of travellers and they have to strategize appropriately so as to recover the maximum amount of information from the travellers. We model this as a setting where a health inspector encounters travellers having an n-length travel history. A trav-eller can be classified baesd on its type which determines its tendency to misreport its information. The health inspector wishes to know the travel history of the traveller but does not know the type of the traveller. Moreover, the traveller may have an incentive to misreport its travel history. We formulate this problem as a game between the traveller which we call the sender and the health inspector who is the receiver. We show that the Stackelberg equilibrium strategy of the receiver, with the receiver as the leader, is to commit to a decoding function that meaningfully decodes only a subset of sequences correctly and deliberately induces an error on the rest of the sequences. In other words, the optimal strategy of the health inspector is to present a questionnaire with a predefined list of a subset of travel histories and ask the traveller to select any and exactly one. We capture the misreporting nature of the sender with a utility that depends on the true sequence of travel history, the history recovered by the receiver and its type. Crucial to our analysis is the notion of the sender graph which is a graph on the space of the sequences induced by the utility of the sender. We define a notion of rate of information extraction for the receiver which determines the growth of the perfectly recovered sequences with n. We compute upper and lower bounds on the maximum rate of information extraction. We show that barring the case when the types of the senders are completely divergent in their utility structures, the number of perfectly recovered sequences grows exponentially in n. The problem of information extraction is related to the general problem of communication between sender and receiver with misaligned objectives studied in game theory [1, 2, 3] control theory [4, 5, 6] , economics [7, 8] . Our work is also related to the results in [9] where we discussed a case with a single sender and showed that the maximum rate of information extraction is bounded above by the Shannon capacity of a certain graph. In [10, 11] , we studied a related information extraction problem where the receiver tried to achieve asymptotically vanishing probability of error. The space of symbols is denoted by the calligraphic letter X and the space of vector valued sequences is denoted as X n . Note that to declutter notation, unless stated otherwise, the sequences x, y, z will be vector valued. The set of probability distributions on a space is denoted as P(·). For a graph G, the size of the largest independent set is denoted as α(G). The image of a function g is denoted as Im(g). Let X , |X | < ∞ be the set of all locations. A particular location X may be a COVID-19 hotspot or a safe region. The travellers from these locations arrive at an airport or a train station with travel histories x ∈ X n which is a sequence of n locations. Each of the travellers have a classification called the type which determines their degree of honesty. We denote the type as λ and the set of all types is denoted as Λ. We assume that |Λ| < ∞. The health inspector wants to know the travel histories from each of the travellers. However, the inspector does not know the type of the traveller, but only has a noisy observation. We model this as a belief over the types denoted as P Λ ∈ P(Λ). The inspector presents the travellers with a checklist which requires the travellers to choose anyone travel history from a specified histories. Since the inspector is unaware of the type of the traveller, the checklist is common across all travellers. If a traveller with travel history x, states his history as x, we say that the health inspector recovers a sequence x ∈ X n . The aim of the inspector is to get x = x without the knowledge of x and λ. If the traveller is not honest, then it may wish to deceive the health inspector such that when x is the travel history, then the inspector recovers x such that x = x. We quantify this trait with a utility U n ( x, x, λ) where U n : X n ×X n ×Λ → R is defined as with U : X ×X ×Λ → R being the single letter utility. Thus, the utility of a traveller depends on the history recovered by the health inspector x, the true history x and its type λ. For a traveller with type λ and history x, we say that the traveller prefers for all x, x ′ ∈ X , x ′ = x, then we say that the traveller is honest. It also follows that U n (x, x, λ) > U n (x ′ , x, λ) for all x, x ′ ∈ X n , x ′ = x for the honest sender. For the game formulation, we call the traveller as sender and the health inspector as a receiver. A sender with λ ∈ Λ and travel history x ∈ X n responds to the checklist of the receiver as s λ n (x) = y ∈ X n , where s λ n : X n → X n . The receiver maps this response of the sender as g n (y) = x, where g n : X n → X n . Although this allows for a wide range of functions, we later show that without loss of generality, the receiver can choose a subset of sequences I n ⊆ X n such that g(x) = x for all x ∈ I n and g(x ′ ) =x for all x ′ / ∈ I n , wherē x ∈ I n . In other words, it suffices that the health inspector need only specify a list of locations and ask the traveller to select exactly one. Let D(g n , s λ n ) := x ∈ X n | g n • s λ n (x) = x be the set of perfectly recovered sequences when the receiver plays the strategy g n and the sender plays the strategy s λ n . The receiver aims to maximize the size of the set D(g n , s λ n ) averaged over the types of the senders by choosing an appropriate strategy g n . The sender, on the other hand, tries to maximize the utility U n (g n • s λ n (x), x, λ) by choosing an appropriate strategy s λ n . We formulate this problem as a game between the sender and receiver. In particular, we consider a leader-follower game, also called a Stackelberg game, with the receiver as the leader and the sender as the follower. The game proceeds as follows. The receiver, being the leader, plays and announces its strategy, i.e., a list of hitories, before the sender. For a given strategy of the receiver, the sender chooses a response that maximizes its utility. The receiver anticipates this response of the sender and accordingly chooses an optimal strategy that maximizes its objective, i.e., a "decoding" function which, when composed with the sender's "encoding" strategy, recovers the maximum number of the sequences perfectly. This leads to the equilibrium concept called the Stackelberg equilibrium solution [12] . where the best response set of the sender, B(g n , λ), is This formulation is apt for our setting since the health inspector first presents the travellers with a checklist and then the travellers choose their responses accordingly. For a given strategy g n of the receiver, sender of each type responds with a strategy that depends on its utility structure. Thus, the set of best responses B(g n , λ) is also a function of the type λ. In (1), we minimize over the set of best responses of the sender B(g n , λ) because the receiver does not have control over the choice of the best response of the sender. Thus, we assume that the receiver chooses its strategy according to the worst-case scenario and hence adopts a pessimistic veiwpoint. We now discuss an example which demonstrates the gameplay and some important aspects of game. Since h is an honest sender, it does not prefer to lie about its history. The same is not true for the sender type d which is a dishonest sender. For the sender type λ = d, the utility is Thus, the dishonest sender prefers to lie about all its travel Let n = 1. Suppose the receiver chooses a naive strategy g : X → X as g(i) = i for all i ∈ X ; in this strategy the receiver blindly believes the sender's word and maps it to the same location. Thus, its presents the sender with all possible options for reporting locations. The best response set of the sender type h is the strategy s h (x) = x for all x ∈ X . For the sender type d, it can easily be checked that the best response set B(g, d) = {s d , s d }, wherē It can be observed the for the sender type h, the set of perfectly recovered sequences is D(g, s h ) = X . For the sender type d, D(g, s d ) = ∅ for all s d ∈ B(g, d). This gives λ∈Λ P(λ) min s λ ∈B(g,λ) |D(g, s λ )| = 1 3 3 + 2 3 0 = 1. However, the receiver can do better by cleverly choosing its strategy. Suppose instead the receiver chooses a strategy g as g(0) = 0 = g(1) and g(2) = 2. Thus, the receiver does not naively decode the sender's message. It chooses to correctly decode only a subset of sequences, in this case {0, 2} by applying an identity map on {0, 2}. For the sequence 1, the receiver decodes it to the sequence 0 or equivalently, the checklist has only two locations {0, 2}. Again, for the sender type h, the best response strategy is s h (x) = x for all x ∈ X . For the sender type d, the best response strategy is s d (i) = 0 for all i ∈ X . It can be observed that D(g, s h ) = {0, 2} and D(g, s d ) = {0} and hence λ∈Λ P(λ) min s λ ∈B( g,λ) |D( g, s λ )| = 1 3 2 + 2 3 1 = 4 3 . Thus, a more sophisticated choice of strategy g improves on the naive strategy g. The difference between the strategies g and g is that in the latter, the receiver deliberately induces an error when it receives a sequence 1 from the sender. Since 1 is not in range of g, sender has only {0, 2} as its choices. Given these choices, the sender type d is forced to report 0 as 0. This is because, the sender does not prefer 2 or over 0 since U (2, 0) < U (0, 0). Thus, the strategy g limits the choices of the sender and forces it to be honest for the sequence 0. This strategy also limits the choices of the honest sender and thereby only {0, 2} are recovered. However, on average, the receiver does better since P Λ (h) < P Λ (d). In this section, we introduce some notions to quantify the amount of information that can be extracted from the senders. First, we define a notion of the sender graph. The sender graph for a sender of type λ, denoted as G n λ = (X n , E), is a graph where (x, y) ∈ E if either U n (x, x, λ) ≤ U n (y, x, λ) or U n (y, y, λ) ≤ U n (x, y, λ). For n = 1, the graph G 1 λ is denoted as G λ . Thus, two sequences x and y are adjacent if the sender has an incentive to report one as the other. Definition 3.4 (λ-partition of a set) Let I n ⊆ X n be any set. For λ ∈ Λ, the λ-partition of the set I n is defined as I n λ := {x ∈ I n : U n (x, x, λ) > U n (y, x, λ) ∀ y ∈ I n , y = x}. For the set of types Λ, let the collection of all λ-partitions of the set I n be denoted as F (I n ) = {Ī n λ } λ∈Λ . Thus, the setĪ n λ is such that when the sender observes a sequence x ∈Ī n λ , the sender prefers to report x honestly amongst the sequences in I n . We now define the notion of rate which determines the growth of the perfectly recovered sequences with n. Then, the rate of information extraction is defined as R(g n ) = (D * (g n )) 1/n . We now characterize the set of perfectly recovered sequences in a Stackelberg equilibrium of the game. Theorem 4.1 Let n ∈ N be fixed. For a sender type λ ∈ Λ with utility U n (·, ·, λ), let G n λ be the corresponding sender graph. Let g n be any strategy of the receiver and let F (Im(g n )) = {Ī n λ } λ∈Λ be the collection of the λ-partitions of the set Im(g n ). Then, Proof : Consider the sender with type λ. Letx ∈Ī n λ . Then, from the definition ofĪ n λ , we have U n (x,x, λ) > U n (x ′ ,x, λ) ∀ x ′ ∈ I n , x ′ =x. Thus, for all s λ n ∈ B(g n , λ), g n • s λ n (x) =x. Since for all other s ′λ n where g n • s ′λ and hence U n (g n •s ′λ n (x),x, λ) ≤ U n (g n •s λ n (x),x, λ) ∀ s ′λ n , with equality if and only if g n • s ′λ n (x) = g n • s λ n (x). Thus, for all s λ n ∈ B(g n , λ), we have g n • s λ n (x) =x. Let I n λ = I n \Ī n λ . Then, from the definitions ofĪ n λ , we have that for all y ∈ I n λ , there exists an y ′ ∈ I n , y ′ = y such that U n ( y, y, λ) ≤ U n (y ′ , y, λ). Thus, there exists s ′λ n ∈ B(g n , λ) such that g n • s ′λ n ( y) = y for all y ∈ I n λ . Thus, allx ∈Ī n λ are recovered by the receiver irrespective of the strategy chosen by the sender. However, in the worst case no sequence in I n λ is recovered. Thus, min s λ n ∈B(gn,λ) |D(g n , s λ n )| = |Ī n λ |. This result holds for all senders and hence we get D * (g n ) = λ∈Λ P(λ)|Ī n λ |. This theorem show that for a strategy g n of the receiver, the perfectly recovered sequences are given by the set ∪ λĪ n λ , whereĪ n λ is the λ-partition of Im(g n ). However, given a set I n and its collection λ-partition F (I n ), it is not clear how to perfectly recover the set ∪ λĪ n λ . The proof of the following theorem suggests one such strategy for the receiver. We also characterize the set of perfectly recovered sequences in a Stackelberg equilibrium of the game. Theorem 4.2 Let n ∈ N be fixed. For a sender type λ ∈ Λ with utility U n (·, ·, λ), let G n λ be the corresponding sender graph. Then, for all equilibrium strategies g * n of the receiver where {Ī n λ } λ∈Λ is the collection of the λ-partitions of I n . Proof : Let I n ⊆ X n be any set and letĪ n λ be its λ-partition. Define a strategy g n for the receiver as wherex is some sequence inĪ n λ . From this choice of g n is it clear that Im(g n ) = I n . From Theorem 4.1, we get that D * (g n ) = λ∈Λ P(λ)|Ī n λ |. Maximizing over the choice of strategies g n is equivalent to maximizing over I n and hence max gn D * (g n ) = D * (g * n ) = max I n ⊆X n λ∈Λ P(λ)|Ī n λ |. Thus, we see that the optimal questionnaire consists only a subset of histories I n from X n . Thus, to determine the optimal questionnaire, the receiver has to determine the largest set I n for which λ∈Λ P(λ)|Ī n λ | is maximized. A particular choice of the set I n would be to take the largest independent set in the graph ∪ λ G n λ , denoted as I n . Since I n is also an independent set in each of the sender graphs G n λ , we get that I n λ = I n for all λ ∈ Λ. Thus, D(g n ) = α(∪ λ G n λ ). However, in the following we show that set of set may not be the optimal choice for the receiver. Let n ∈ N be fixed. For a sender type λ ∈ Λ with utility U n (·, ·, λ), let G n λ be the corresponding sender graph. Then, for all Stackelberg equilibrium strategies g * n , Proof : Let g n be a strategy of the receiver such that | Im(g n )| = α (∪ λ G n λ ). Define I n = Im(g n ). Clearly, I n is an independent set in G n λ and hence for all λ, the λ-partition isĪ n λ = I n . This gives that D * (g n ) = α (∪ λ G n λ ) ≤ D * (g * n ). Thus, R(g * n ) ≥ D * (g n ) 1/n = α (∪ λ G n λ ) 1/n. . For the upper bound, we use the fact thatĪ n λ is an independent set in G n λ . Thus, |Ī n λ | ≤ α(G n λ ), which gives that R(g * n ) = D * (g * n ) 1/n ≤ λ∈Λ P Λ (λ)α(G n λ ) 1/n . Remark: In the Example 3.1, only singleton sets are independent sets in G h ∪ G d and hence α(G h ∪ G d ) = 1. Thus, if the receiver chooses a strategy g(x) = 0 ∀ x ∈ X , then D(g, s λ ) = {0} ∀ s λ ∈ B(g, λ), ∀ λ ∈ Λ. This gives R(g) = 1. However, as demonstrated in Example 3.1, this choice of strategy is sub-optimal for the receiver. We now take the limit as n → ∞ and derive fundamental bounds on the rate of information extraction. Theorem 4.4 Consider senders having type λ ∈ Λ with utility U (·, ·, λ) and let {G n λ } n≥1 be the sequence of sender graphs. Then, for all sequences of Stackelberg equilibrium strategies {g * n } n≥1 of the receiver, α (∪ λ G λ ) ≤ lim sup n→∞ R(g * n ) ≤ Ξ(U , λ * ). where λ * = max λ∈Λ α(G λ ) and Ξ(U , λ * ) := lim n→∞ α(G n λ * ) 1/n . Proof : We first derive the lower bound. Let I be the largest independent set in ∪ λ G λ , i.e., |I n | = α (∪ λ G λ ). Let I n := I ×. . .×I. We show that I n is an independent set in the graph ∪ λ G n λ . Let x, y ∈ I n be distinct sequences and consider the difference of utilites for type λ U n (x, x, λ) − U n (y, x, λ) Since x i , y i ∈ I for all i ∈ {1, . . . , n}, we have U (x i , x i , λ) > U (y i , x i , λ) for all i. Thus, U n (x, x, λ) > U n (y, x, λ). We can similarly prove that U n (y, y, λ) > U n (x, y, λ). This holds for all λ ∈ Λ and hence x, y ∈ I n are non-adjacent in the graph ∪ λ G n λ . Thus, Using the lower bound in Corollary 4.3 and taking the limit superior as n → ∞, the lower bound in (5) follows. We now derive the upper bound. We have λ∈Λ P Λ (λ)α(G n λ ) 1/n = (P Λ (λ * )α(G n λ * )) 1/n 1 + λ∈Λ,λ =λ * P Λ (λ)α(G n λ ) P Λ (λ * )α(G n λ * ) 1/n . Using α(G λ * ) ≥ α(G λ ), it can be shown that for all n, α(G n λ * ) ≥ α(G n λ ) [13] . Thus, lim n→∞ λ∈Λ,λ =λ * P Λ (λ)α(G n λ ) P Λ (λ * )α(G n λ * ) = 0 and hence lim n→∞ 1 + λ∈Λ,λ =λ * P Λ (λ)α(G n λ ) P Λ (λ * )α(G n λ * ) 1/n = 1. Using this and P Λ (λ * ) 1/n → 1, we get lim n→∞ λ∈Λ P Λ (λ)α(G n λ ) 1/n = lim n→∞ α(G n λ * ) 1/n . (6) The limit on the right hand side exists [13] and is denoted as Ξ(U , λ * ). Using the upper bound in Corollary 4.3 and taking the limit superior as n → ∞, the upper bound in (5) follows. Strategic information transmission Multiple referrals and multidimensional cheap talk On multi-dimensional and noisy quadratic signaling games and affine equilibria Hierarchical multistage gaussian signaling games in noncooperative communication and control systems Estimation with strategic sensors Information-theoretic approach to strategic communication as a hierarchical game Bayesian persuasion Information design: A unified perspective Zero error strategic communication Achievable rates for strategic communication Communicating with a strategic sender Dynamic Noncooperative Game Theory Information extraction from a strategic sender over a noisy channel