key: cord-0509301-hi2vlin2
authors: Meister, Michela; Kleinberg, Jon
title: Optimizing the order of actions in contact tracing
date: 2021-07-20
journal: nan
DOI: nan
sha: af626515890cebd15b9904332a33d333fdd81a9a
doc_id: 509301
cord_uid: hi2vlin2

Contact tracing is a key tool for managing epidemic diseases like HIV, tuberculosis, and COVID-19. Manual investigations by human contact tracers remain a dominant way in which this is carried out. This process is limited by the number of contact tracers available, who are often overburdened during an outbreak or epidemic. As a result, a crucial decision in any contact tracing strategy is, given a set of contacts, which person should a tracer trace next? In this work, we develop a formal model that articulates these questions and provides a framework for comparing contact tracing strategies. Through analyzing our model, we give provably optimal prioritization policies via a clean connection to a tool from operations research called a"branching bandit". Examining these policies gives qualitative insight into trade-offs in contact tracing applications.

: Manual contact tracing begins with an index case (A), after which further contacts are revealed. Contact tracing proceeds recursively, where the tracer must decide the order in which to investigate contacts, with different orderings yielding different rewards.

aim of these guidelines is to synthesize these trade-offs into a decision-making protocol for tracers to follow. The complexity of this process is emphasized in the CDC's Guidelines for Investigations of Contacts with Infectious Tuberculosis.

"Contact investigations are complicated undertakings that typically require hundreds of interdependent decisions, the majority of which are made on the basis of incomplete data, and dozens of time-consuming interventions." [5] Protocols for prioritizing contacts typically take the form of a flow-chart or matrix that assigns contacts to different groups and then dictates the order in which these groups ought to be queried. Often multiple criteria are considered when categorizing contacts. In many cases, two of the most important factors are the probability that a contact is infected and the recency of their exposure.

Despite the fact that the importance of these factors is well-understood, agencies still have difficulty managing these trade-offs, which is not without cost. During a 2017 HIV outbreak in West Virginia, contact tracers in the West Virginia Department of Health and Human Resources were overwhelmed by the surge in cases. In such a situation, CDC guidelines recommend interviewing contacts associated with clusters of infections, who may be at high risk of infection [35] . However, the contact tracers had no means of adjusting the order in which they investigated cases to respond to the outbreak.

" [T] here was no supervisory triage system to respond to a cluster of HIV infections. As a result, [contact tracers] would investigate cases linearly -prioritizing index case investigations over investigating contacts within clusters -and did not have the flexibility to shift their priorities based on the identification of an ongoing HIV cluster." [36] Given the importance of identifying infected cases quickly, both with respect to preventing future infections and initiating medical treatment early, delays like this may cause real harm.

Even in the numerous cases where an agency has an effective method of prioritization, there are subtleties to how these priorities are chosen [10, 44] . For example, the protocol for COVID-19 contact tracing developed by the North Carolina Department of Health and Human Services prioritizes contacts by recency of exposure, with one exception: contacts associated with a cluster or outbreak of infections are prioritized first, regardless when they were exposed [31] . By clearly dictating when to shift priorities in an investigation, this protocol addresses the issues presented in the West Virginia case. Yet many questions still remain. How are these priorities chosen? How do these tools translate to different settings? How might priorities change with slightly different factors or as parameters change? Is there even a way to ask these sorts of questions?

In this work, we develop a formal model that articulates these questions and provides a framework for comparing contact tracing strategies. Through analyzing our model, we give provably optimal prioritization policies via a clean connection to a tool from operations research called a "branching bandit" [45] . Examining these policies gives qualitative insight into trade-offs in contact tracing applications. The model we study has two phases: first an infection spreads within a population; then the infection process halts and a contact tracing intervention begins. Of course, contact tracing is an extremely complex process requiring nuance and domain expertise, and there are many factors which this stylized model cannot capture -for example, the dynamics of contact tracing while a disease is still spreading, which we explore in forthcoming work [30] . Yet this two-phase model already exposes trade-offs and questions about prioritization, decision-making, and resource allocation, which not only seem important in their own right, but also seem prerequisite to understanding more complex settings.

Paper Organization. First, in section 1.1 we discuss relevant work from the contact tracing, operations research, and computer science literatures. Then in section 2 we present an example to illustrate the trade-offs at play in contact tracing. In section 3 we present our formal model, and in section 4 we give an overview of our results for the four models we consider. The remaining sections 5-8 present each of the four models, their optimal policies, and the analysis of these policies. Finally we discuss a few compelling directions for future work in section 9.

Work closest to ours focuses on comparing contact tracing policies and the question of "who to trace". Specifically, Armbruster and Brandeau consider a network model where a tracer with limited capacity must decide at each step which contact to trace next. In [4] via simulation they evaluate three different policies for prioritizing contacts. Using the best of these three policies, in [3] , they describe the trade-offs between investing in such a contact tracing effort versus directing funding toward other interventions. Tian et al. also consider a network model where they evaluate a set of contact tracing strategies targeted at various subgroups within a population [42] . Among compartmental models, Hethcote et al. evaluate sets of targeted contact tracing strategies in [18] and Eames considers targeted tracing for different population structures in [8] . Our results differ from this prior work in that we provide provably optimal contact tracing policies, as opposed to evaluating a set of specified policies or strategies.

The importance of developing prioritization strategies for contact tracing under resource constraints is highlighted in recent surveys [25, 32] . Kaplan et al. consider tracing under resource constraints as well, however in a somewhat different setting [19, 20] . There have also been many studies evaluating under what conditions contact tracing is effective and when a disease can be controlled via tracing [7, 12, 17, 21, 22] . In [33] , Müller et al. analyze a branching process model and compare the fraction of contacts traced to the effective reproduction number.

Digital contact tracing is surveyed in [1] . Our work focuses on contact tracing carried out by human tracers and is orthogonal to digital contact tracing apps. In the digital setting, Lunz et al. develop optimal policies for deciding how to quarantine contacts of infected individuals, where some fraction of contacts can be traced by digital means [27] .

Within the operations research and computer science literatures, our work is most closely related to search problems on trees that do not have clear connections to contact tracing, but which use similar techniques. The problem closest to our work is the tree-constrained Pandora's Box problem studied in [6] , which also analyzes stochastic selection on a tree, but which involves a fairly different objective. Another related problem is stochastic probing with constraints [16] . While our model of contact tracing involves a tracer operating on a tree, it is quite different from the minimum latency problem on weighted trees [38] in that the tracer does not physically travel to the the individuals they trace. Finally, our model formulation can be viewed as falling under the general class of Markov decision processes, but the key is that it falls under the specific class of branching bandit problems, which leads to an efficient solution.

We begin with an example that illustrates the trade-offs at play in prioritizing contacts. Suppose an infection spreads in a population over the course of two days, 1 after which the infection halts and contact tracing Figure 2 : In a mathematical model of contact tracing, the tracer knows the probability of infection for each contact and needs to choose an ordering for investigating contacts. As we show in this paper's main results, there is an efficient algorithm to decide an optimal ordering. begins. Specifically, over the course of two days each individual in a population meets one new contact each day with probability q ∈ (0, 1], and infected individuals probabilistically infect each new person they meet. The following events take place:

• On day t = −2, w is infected with probability p w .

• On day t = −1, with probability q, w meets x and if w is infected they infect x with probability p x .

• On day t = 0, with probability q, w meets y and if w is infected they infect y with probability p y .

• On day t = 0, with probabability q, x meets z and if x is infected they infect z with probability p z .

• After day t = 0, the community goes into lockdown and no new infections occur. Now consider a contact tracer working to discover infected cases. On day t = 0 the tracer finds that w is infected and learns that they met x on day t = −1 and also met y on day t = 0.

Going forward, on each day t ≥ 1 the tracer chooses one individual to query. When an infected individual is queried, they receive medical treatment and their contacts become available to query in the future. Querying an individual infected for τ days returns benefit 2 −τ +1 , which represents the probability they respond to treatment. 2 Querying an uninfected individual returns benefit 0. Each individual may be queried at most once. The tracer's objective is to maximize the total benefit accumulated. Since x was exposed to w on day t x = −1, the tracer knows that x may have met another contact (arbitrarily called z) on day t = 0, however they have no way of accessing z (or even knowing if z exists) before querying x. Since y was exposed on day t y = 0, the tracer knows that y met no further contacts.

The question is, which contact should the tracer query first, x or y? Querying x potentially grants access to z, however y was exposed more recently than x, so if y is infected it returns a higher benefit. Figure 2 shows the benefit of different ways in which the tracer can operate, numerically evaluated for two different parameter settings.

This simple example already highlights a few crucial points we explore in this work. For one, querying the node with the higher expected immediate benefit is not always optimal. This lack of a simple rule might at first seem to imply that a contact tracer working in this synthetic model would need to calculate the expected benefit of each possible ordering in order to identify the optimal choice.

In fact this is not the case. Via a dynamic programming approach we provide optimal policies for querying individuals that are computed before contact tracing begins and which are straightforward to implement: a "priority index" is computed for each individual, and at each step the contact tracer queries the individual with the highest priority index.

Modeling a contact tracing process involves a few different ingredients. People need to meet and make new contacts, through these interactions the infection needs to spread to some individuals and not others, and finally, we need a way of identifying infected individuals and their contacts. We develop a model with two phases. Phase 1 spans steps −T ≤ t ≤ 0 and involves a contact process, which describes how people meet new contacts, and an infection process, which describes how the infection spreads through these interactions. At the end of Phase 1, the contact and infection processes halt, and from then on no new infections occur. Phase 2 begins on step t = 0 and continues indefinitely. At the start of Phase 2 a set of index cases are identified. We define an index case as an individual that is exposed to infection and becomes infected according to a probability function p. We are agnostic to the origin of the index cases; for example, an index case may have been identified via surveillance testing, another contact tracing effort, or random chance. Starting with the index cases, on each step t ≥ 0 a contact tracer, simply called a tracer, selects one individual to query. Querying an individual models the traditional test-and-trace process: it reveals the individual's infection status, and if they are infected it reveals their contacts; these contacts may then be queried on future steps. When an infected individual is queried, they are also receive medical treatment for the disease. For the remainder of the second phase the tracer iteratively queries individuals with the goal of identifying infected cases as efficiently as possible.

Phase 1 Phase 1 spans steps t = −T to t = 0. During this phase, individuals meet new contacts and the infection spreads through these interactions.

Let D be an arbitary distribution on {0, 1, 2, . . . }. On each step each individual meets a random number of contacts Z ∼ D, where Z is drawn independently for each individual at each step. If an individual is infected, they infect each new contact they meet independently according to a probability function p defined separately for each model we consider. We call these contacts exposed. Exposed individuals are labeled by the recency of their exposure: an individual exposed at time t = −h has recency h. Since Phase 1 spans steps t = −T to t = 0, exposed individuals have recencies in {0, 1, . . . , T }. We assume that an individual's recency can be observed but their infection status is hidden.

Step t = 0: At the end of step t = 0 the contact and infection processes halt, and from then on no new infections occur. To understand the system on step t = 0, consider an individual v exposed on step t = −h in Phase 1. If v becomes infected, then for each step h − 1 ≤ t ≤ 0 in the remainder of Phase 1, v exposes Z t ∼ D individuals. Then by the end of step t = 0, v has exposed a multiset of contacts Z(h) = (Z 0 , Z 1 , . . . , Z h−1 ) ∼ D h where Z j indicates the number of contacts of recency j. Thus we can model v as the root of a tree, where the nodes in the first layer represent the contacts v met after being exposed, the nodes in the second layer represent contacts individuals in the first layer met after meeting v, and so on. We call this a tree of potential exposures because there is a path of contacts from v to each individual in the tree along which the infection could potentially travel. Since the distribution D on contacts is fixed, v's recency h(v) determines the distribution on the tree of potential exposures. We often refer to the nodes in the tree of exposures as v's descendants.

We call the probability that an exposed node is infected the probability of infection. In the first model we examine, the probability of infection is constant. In the second model, the probability of infection depends on the node's recency as defined by a function p(h). Either way, for both models a node's recency contains all the information needed to determine its probability of infection and the distribution on its tree of exposures. In the third model we consider, the probability of infection depends both on the node's recency and the recency of its parent. Going forward, we use the terms "node" and "individual" interchangeably.

Phase 2 Phase 2 begins on step t = 0 and continues indefinitely. Throughout Phase 2 an individual's infection status is fixed but hidden. During this phase a contact tracing effort proceeds.

Contact tracing begins on step t = 0 when a set of index cases are identified. From then on, at each step t ≥ 0 the tracer selects one node to query. Querying a node reveals its infection status, and if it is infected reveals the node's children along with the recency of each child. These children may then be queried on future steps. We assume that nodes of the same recency are indistinguishable until they are queried.

On step t = 0, the index cases are the only individuals available to query. The contact tracer observes the recency of each index case but has no information about any other individuals in the population. For the remainder of Phase 2, the tracer may only query a node that is an index case or the child of an infected node queried on an earlier step. Equivalently, we can view each index case as the root of a tree of potential exposures, which together form a forest. Through this query process the tracer maintains a sub-forest where any leaf not already queried is either a root or the child of an infected node. These leaves form the frontier, and each step the tracer selects one node to query from the frontier. Observe that, by definition, each node in the frontier was exposed to infection in Phase 1.

To understand this process further, consider the options available to the tracer each step. Since nodes of the same recency are indistinguishable until they are queried, the system on any step t ≥ 0 is defined by a multiset S t = (X 0 , X 1 , . . . , X T ), where X j indicates the number of nodes of recency j present in the frontier. We call S t the state on step t. We use both notions of a multiset when referring to states; i.e. S t can be viewed as a collection of elements or as a vector of counts.

Querying an infected node v returns a benefit, which represents the probability that v responds to medical treatment. The benefit of querying an infected node decays relative to the duration of the infection and depends on the node's recency h(v) and the step t it is queried, as defined by a function b(h(v), t). The benefit of querying an uninfected node is 0, and each node may be queried at most once. The tracer's objective is to maximize the total expected benefit returned over the course of Phase 2.

Defining the objective. On each step t ≥ 0, the tracer selects a node v t to query. Since the tracer only selects nodes from the frontier, by definition the node v t was exposed to infection in Phase 1.

Thus the total benefit the tracer accumulates over the course of Phase 2 is

The main objective is to develop a policy for querying nodes that maximizes the total expected benefit, where the expectation is taken over all realizations {1(v 0 ), 1(v 1 ), 1(v 2 ), . . . }. Such a policy is called an optimal policy. In order to make this problem tractable, like many other stochastic models we assume exponential discounting. Specifically, the benefit of querying a node infected for τ steps is e −βτ for a fixed parameter β > 0. 3 

Our results can be summarized in three main contributions. 1) Constructing optimal policies. We show how to construct an optimal policy for any instance of our model. These policies have a special property: for any instance of our model the optimal policy can be described by an algorithm that assigns each node a "type", computes an index based on the type, and chooses the node of the highest index. Such a policy is called an index policy. 4 An index policy is efficiently computable if each individual index can be computed in polynomial time. For any instance of our model, we show how to compute an optimal policy that is an efficiently computable index policy. We prove this result in section 8.

We can interpret this result in the context of the contact tracing protocols discussed in section 1. Recall that many contact tracing protocols assign individuals to groups and then dictate the order in which groups ought to be queried. Since our construction computes indices from types, the resulting index policy induces a fixed priority ordering on types. Mapping individuals to nodes and groups to types, our results imply that, for our model, (1) the optimal policy overall is defined by a priority ordering on groups, and (2) this priority ordering has an explicit, efficient construction. It is important to note that, taken on its own, this result does not explicitly describe any policies, and it makes no guarantees about the structure of an optimal policy beyond the promise that it is an index policy. Describing the structure of optimal policies any further requires analyzing the construction itself.

2) Analyzing optimal policies for different models of infection. We examine three different versions of our model, each corresponding to a different model of infection. Analyzing the construction of optimal policies in each model gives qualitative insight into questions about prioritizing contacts from section 1, such as how to trade-off between an individual's probability of infection, the recency of their exposure, or the number of other contacts they may have exposed. The three versions we examine are identical except for a function defining the probability of infection. First we examine a basic model, where the probability of infection is constant, then we examine a univariate model, where the probability of infection decays absolutely with time, and finally we examine a bivariate model where a node's probability of infection decays relative to the incubation period of its parent.

3) Connecting contact tracing and the branching bandit problem. Our key finding is a clean connection between contact tracing and the branching bandit problem [45] . The branching bandit problem broadly belongs to a large class of online decision problems called "bandit" problems. The general motivation for the branching bandit problem is scheduling projects, where each project returns a reward and begets new projects, which must then be scheduled. While bandit problems in general have numerous applications [13, 26, 39] , most applications of the branching bandit model are similar scheduling problems. We therefore find this connection between contact tracing and the branching bandit problem especially striking, since it extends the branching bandit problem to a domain to which it previously has not been applied. We formalize this connection in section 8.

Section Organization. The remainder of this section reviews our results in the order in which they appear. First we discuss the three models of infection we examine: a basic model (section 5), a univariate model (section 6) , and a bivariate model (section 7). Then we discuss a general model of infection which encompasses the previous three models and demonstrates the connection to the branching bandit problem (section 8). Finally, we give an overview of techniques. Throughout the paper, for simplicity we refer to "the" optimal policy when we are discussing a specific optimal policy we are constructing, however models may have multiple optimal policies.

The basic model describes a standard model of infection where the probability of infection is constant.

Theorem 5.1. In the basic model, it is optimal to query nodes in order of recency.

This result provides insight into some of the most vexing questions from section 1. Namely, in practice tracers seem to have these two opposing priorities -whether to query more recent cases or cases that may have exposed many other contacts. In practice it seems that tracers lean towards querying in order of recency. In our model, this result shows that -even taking into account the downstream effects of accessing an individual's contacts -the optimal policy is indeed to query the most recent case first.

In the univariate model the probability of infection decays absolutely with time according to an exponential functional form. The rate of decay is parameterized by a constant α ≥ 0, where for small values of α, the probability of infection is close to constant, and the rate of decay accelerates as α increases. While in the basic model querying nodes in order of recency is optimal, this is not always the case in the univariate model.

For small values of α, when the probability of infection is close to constant and the setting resembles that of the basic model, the optimal policy queries nodes in order of recency. However, once α reaches a certain threshold, the optimal policy no longer queries nodes in order of recency, and one may wonder whether any structure remains at all. In fact, we find that there is still structure to the optimal policy: the policy always queries either the most recent or least recent node available. We say that such a policy is defined by an interleaved priority ordering. Observe that many different priority orderings satisfy the interleaving property. For example, an ordering that prioritizes nodes by recency is interleaved, as is an ordering that prioritizes nodes by reverse recency. Our main result shows that the interleaving property holds for optimal policies in the univariate model. This result implies that the optimal policy always queries either the most recent node or least recent node in the frontier, which exposes an interesting trade-off between the probability that a node is infected, the recency of its exposure, and its expected number of children. Since a less recent node was exposed earlier in time, it has a larger number of children in expectation. Additionally, since the probability of infection decays with time, a less recent node also has a higher probability of infection. However, a more recent node, should it be infected, returns a higher immediate benefit. This result implies that the optimal policy pursues the extremes: it either queries the least recent node with the highest probability of infection and the most children in expectation, or it queries the most recent node, which is associated with the highest benefit. It is particularly interesting that the optimal policy never tends toward a node of intermediate recency, which seems to imply that compromising between these two extremes is suboptimal.

In the bivariate model the probability that a node is infected decays relative to the incubation period of its parent. As a result, defining a node in this model requires examining two parameters, the recency of the node and the recency of its parent. For a node of recency h with a parent of recency h , we call ∆ = h − h the span, representing the span of time the parent has been infected upon meeting the child. The probability that a node is infected decays exponentially as a function of ∆. In this model each node in the frontier is defined by a type (h, ∆), and we examine policies on the set of types {0, 1, . . . , T } 2 . Thus the bivariate model demonstrates that we can analyze policies that take into account multiple parameters.

Analyzing optimal policies in the bivariate model reveals monotonic structure in the ordering of nodes with the same recency.

Theorem 7.1. In the bivariate model, there is an optimal policy that queries nodes with the same recency in order of increasing span.

We also explore monotonicity with respect to recency, in a more restricted model. Here we find that, for recencies in {0, 1, 2} and within certain constraints, nodes of the same span are queried in order of recency.

Theorem 7.2. In the bivariate model, for any Bernoulli distribution D, a large enough constant β > 0, and restricted to types with recencies in {0, 1, 2}, it is optimal to query nodes of the same span in order of recency.

While there are examples to show that a restriction on β is necessary, the restriction to Bernoulli distributions and to recencies in {0, 1, 2} are functions of our proof technique.

The general model provides a broad framework that allows for a variety of factors to affect how individuals interact and how the infection spreads. As we saw in section 1, in practice individuals are often categorized according to multiple attributes, such as their profession or role within a community, their age, or the recency of their exposure. The general model captures this complexity, by assigning each node a type representing a set of attributes. Whereas types in the bivariate model are defined by two parameters, types in the general model can be defined by an arbitrary number of parameters. As a result, the general model encompasses the previous three models as well: in the basic and univariate models a node's type is its recency, and, as before, in the bivariate model a node's type is the pair (h, ∆).

Our main result shows that optimal policies in the general model are index policies on types.

Theorem 8.1. For any instance of the general model, there is an optimal policy that is an index policy on the set of types. Moreover, this index policy has an efficient construction.

This result implies that any instance of the general model has an optimal policy defined by a priority ordering on types. In order to prove this result, we show that any instance of the general model maps to an instance of the branching bandit model. This reduction formalizes the connection between contact tracing in our model and the branching bandit problem, by showing that finding an optimal policy for any instance of the general model requires analyzing an optimal policy for a corresponding instance of the branching bandit model.

Here we formally define the branching bandit problem and describe how the reduction from the general model to the branching bandit problem lays the foundation for our results in the basic, univariate, and bivariate models.

The branching bandit model involves arms belonging to classes {1, . . . , L}, where when an arm of class i is pulled it yields a non-negative reward R(i), occupies µ(i) steps, and is replaced by a set of new arms N i1 , . . . , N iL . Each class i has an arbitrary, known, joint distribution on the random variables R(i), µ(i), and N i1 , . . . , N iL . At each step t the system is defined by a vector n(t) = (N 1 , . . . , N L ), where N i is the total number of arms of class i available, and a reward received at step t is discounted by e −ηt for a fixed parameter η > 0. The objective is to find a policy for pulling arms that maximizes the total discounted reward accumulated. As we describe in section 5.1, the optimal policy in the branching bandit model is an index policy on the set of classes {1, . . . , L} with an efficient construction.

In section 8 we reduce the general model to the branching bandit model, under a mapping where nodes map to arms, types map to classes, benefit maps to reward, and new children revealed by querying a node map to new arms acquired by pulling an arm. This reduction implies that the optimal policy for any instance of the general model is an index policy on types. Since the basic, univariate, and bivariate models are all versions of the general model, this guarantee extends to these three models as well. In particular, for each instance the reduction provides a construction for a priority ordering that defines the optimal policy. However, the construction on its own does not describe the policy explicitly or define any properties of the policy beyond the promise that it is an index policy. Revealing the structure of these policies requires analyzing the construction of optimal policies for each model, which is the main focus of the following sections.

The basic model describes a standard model of infection where the probability of infection is a constant p T ∈ (0, 1]. Since the benefit of querying an individual infected for τ steps is e −βτ , the benefit of querying an infected individual of recency h on step t is b(h, t) = e −β(h+t) . The basic model is a special case of both the univariate and bivariate models.

Understanding the trade-offs at play. As described in section 3, we can view each node in the frontier as the root of a tree of exposures, where less recent nodes have more children in expectation. Recall that if a node is queried and found to be infected, its children are added to the frontier. Therefore, querying a less recent node provides an opportunity to significantly expand the frontier. On the other hand, since the benefit of querying an infected node decays with time, querying a more recent node returns a higher expected benefit. When selecting a node to query, how should the tracer trade-off between these two factors -on the one hand, the expected benefit associated with querying a node, and on the other hand, the opportunity to access its descendants in the future?

Our main result shows that -even taking into account these downstream effects -the optimal policy queries nodes in order of recency.

Theorem 5.1. In the basic model, it is optimal to query nodes in order of recency.

One might assume that such a straightforward policy has a correspondingly straightforward proof. After all, since querying the most recent node returns the highest expected benefit, perhaps an elementary exchange argument is sufficient.

An attempt at an elementary exchange argument. Following this line of reasoning, suppose on step t a node u is the most recent node in the frontier, but the tracer selects a less recent node v and queries u on some later step. Now consider exchanging the order of u and v so that u is queried on step t instead. If u is infected, then its children are added to the frontier. Since any child is more recent than its parent, and since u is the most recent node in the frontier on step t, on step t + 1 the children of u are more recent than all other nodes in the frontier. Therefore, a commitment to prioritizing nodes by recency requires querying the descendants of u recursively in order of recency.

Exchanging u and v then becomes tricky. As a thought experiment, consider querying either u or v in isolation and then recursively querying any descendants in order of recency. Since a node is always less recent than its descendants, querying u only ever leads to querying nodes more recent than u. However, since v is less recent than u, querying v could lead to querying nodes more recent than v but less recent than u. From this standpoint, it's unclear how to compare these two processes or go about an exchange. One observation is that we could potentially compare the two processes if we measured the process catalyzed by v only up until the first step where a node less recent than u is queried. In fact, such a scheme already exists; continuing with this argument (which is now far from elementary) essentially requires reinventing machinery developed by Weiss for the branching bandit model.

Here we summarize Weiss's branching bandit model and the optimal policy for pulling arms, with a slight derparture from the original notation in [45] . Then in section 5.2 we map the basic model to the branching bandit model by mapping each node to an arm and each recency to a class of arms.

Recall that the branching bandit model is a general framework involving arms of classes {1, 2, . . . , L}, where each time an arm of class i is pulled it returns a non-negative reward R(i), occupies µ(i) steps, and is replaced by a set of new arms N i1 , . . . , N iL . Weiss's key idea is the notion of a period. A period is defined with respect to any arbitrary priority ordering σ on classes {1, 2, . . . , L}. For any class i ∈ {1, 2, . . . , L}, an (i, σ j )-period is defined as follows. Initially only a single arm of class i exists in the system. On step t = 0 the arm of class i is pulled, and from then on at each step an arm is pulled according to the priority ordering σ until all classes i σ j are exhausted. Therefore, an (i, σ j )-period is really defined with respect to the prefix σ 0 , σ 1 , . . . , σ j , since the ordering of the later elements is irrelevant.

A few random variables describe an (i, σ j )-period. Let r(i, σ j ) be the total discounted reward accumulated during the period, and let τ (i, σ j ) be the duration. Observe that following a period of duration τ (i, σ j ), the reward of the next query is pre-multiplied by the discount factor e −ητ (i,σj ) . Call γ(h, σ j ) = E[e −ητ (i,σj ) ] the expected pre-multiplier.

Defining the optimal policy in Weiss's model. Weiss leverages this notion of a period to inductively construct the optimal priority ordering. He then proves that the index policy defined by this optimal priority ordering is in fact an optimal policy outright.

The optimal priority ordering is constructed via a dynamic program which maintains an optimal prefix that lengthens in each round. In round 0 the highest priority element σ 0 is selected. Entering any round k > 0, the prefix σ 0 , σ 1 , . . . , σ k−1 is fixed, and σ k is selected by comparing (i, σ k−1 )-periods over all classes i not in the prefix.

Describing the base case requires one more definition. For any i ∈ {1, 2, . . . , L} an (i, ∅)-period simply pulls an arm of class i exactly once. The reward r(i, ∅), duration τ (i, ∅), and pre-multiplier γ(i, ∅) are defined analogously, so by definition τ (i, ∅) = 1 and γ(i, ∅) = e −η .

The highest priority is assigned to the element with the highest expected immediate reward.

In any round k > 0, the prefix σ 0 , σ 1 , . . . , σ k−1 is fixed, and σ k is selected from the elements not already in the prefix.

Section 3 of [45] proves that the priority ordering σ constructed via this dynamic program is the optimal priority ordering, and moreover, that the index policy defined by σ is the optimal policy overall.

An overview of Weiss's proof idea. For a high-level overview of the proof idea, first observe that σ 0 is the element associated with the maximal expected immediate reward. To understand the intuition behind this selection, imagine choosing between a node u which returns the maximal expected immediate reward and some other node v. Since the expected immediate reward of any descendant of v is at most that of u, there is no reason to delay querying u in order to access the descendants of v.

Moving ahead to any round k > 0, committing to the prefix σ 0 , σ 1 , . . . , σ k−1 implies that if we query a node, we are committing to recursively querying its descendants according to the ordering defined by the prefix. Therefore, we are no longer comparing individual queries but instead (h, σ k−1 )-periods. To compare periods, we need to trade-off between the expected reward E[r(h, σ k−1 )] returned and the expected pre-multiplier γ(h, σ k−1 ) imposed on the queries that follow. In this sense, we can think of σ k as selecting, from the elements not already in the prefix, the element h whose (h, σ k−1 )-period has the highest expected "rate" of reward in this time-discounted setting.

Here we map the basic model of contact tracing to the branching bandit model by mapping nodes to arms, recencies to classes, benefit to reward, and the new children revealed by querying a node to new arms acquired by pulling an arm. We prove this reduction formally in section 8. We now restate the above dynamic program applied to the basic model, beginning with the definition of a period.

A period is defined with respect to any arbitrary priority ordering σ on recencies {0, 1, . . . , T }. For any h ∈ {0, 1, . . . , T }, an (h, σ j )-period is defined as follows. Initially only a single root v of recency h(v) = h exists in the frontier. On step t = 0 the node v is queried, and from then on at each step a descendant is queried according to σ until all nodes v with recency h(v ) σ j are exhausted. Let b(h, σ j ) be the total discounted benefit accumulated during the period, let τ (h, σ j ) be the duration, and let γ(h, σ j ) = E[e −βτ (h,σj ) ] be the expected pre-multiplier. For any h ∈ {0, 1, . . . , T } an (h, ∅)-period simply queries a single root v of recency h(v) = h exactly once. The benefit b(h, ∅), duration τ (h, ∅), and pre-multiplier γ(h, ∅) are defined analogously, so by definition τ (h, ∅) = 1 and γ(h, ∅) = e −β . Note that the benefit, duration, and pre-multiplier associated with a period are by definition non-negative.

Then σ 0 is assigned to the element with the highest expected immediate benefit.

In any round k > 0, the prefix σ 0 , σ 1 , . . . , σ k−1 is fixed, and σ k is selected from the elements not already in the prefix.

While the above dynamic program produces a priority ordering σ that defines an optimal policy, this in no way implies that the resulting optimal policy queries nodes in order of recency. Indeed, in section 6, we explore other regimes in which optimal policies exhibit very different structure. The main challenge in the following proof is to show that σ = (0, 1, . . . , T ), which implies that the optimal policy queries nodes in order of recency.

An observation about periods. The following proof, in addition to many of the proofs in later sections, involves analyzing periods. For the purpose of analysis, it will be helpful to separate the immediate benefit returned by querying a single root from the benefit returned through recursively querying its descendants. Specifically, for a root v with recency h(v) = h, we can think of an (h, σ j )-period as the concatenation of two sub-periods, where v is queried in the first sub-period and descendants of v are queried in the second sub-period. Thus the first sub-period is equivalent to an (h, ∅)-period. If v is infected, then at the start of the second sub-period the children of v make up the frontier. Since h(v) = h, the children of v have recencies defined by a random multiset Z(h) = (Z 0 , Z 1 , . . . , Z h−1 ) ∼ D h , where Z j indicates the number of children of recency j. To describe the second sub-period, we first need to define a generalization of a period, called an epoch.

For any state S = (X 0 , X 1 , . . . , X T ), define an (S, σ j )-epoch as follows. At the start of step t = 0, the state S defines the recencies of nodes present in the frontier. On each step t ≥ 0 nodes are queried according to the ordering defined by the prefix σ 0 , σ 1 , . . . , σ j until all nodes v with recency h(v ) σ j are exhausted. Let b(S, σ j ) be the total discounted benefit accumulated over the period, let τ (S, σ j ) be the duration, and call γ(S, σ j ) = E[e −βτ (S,σj ) ] the expected pre-multiplier.

Therefore an (h, σ j )-period can be thought of as an (h, ∅)-period, which in the event that the root is infected, is followed by a (Z(h), σ j )-epoch for Z(h) ∼ D h . If the root is infected, then a benefit of e −βh plus the benefit accumulated during the following epoch is returned. If the root is not infected then no benefit is returned. Therefore, the expected benefit accumulated over the (h, σ j )-period is

Likewise γ(h, σ j ) has a similar decomposition. If the root is infected, then τ (h,

If the root is not infected, then τ (h, σ j ) = 1. Therefore

For both decompositions the expectation is over the randomness of a node's descendants and their infection statuses.

Restating the optimal policy. The probability of infection is p T , and the benefit of querying an infected node of recency h is e −βh , so

Applying this definition to eq. (1),

In any round k > 0, the prefix σ 0 , σ 1 , . . . , σ k−1 is fixed, and σ k is selected from the elements not already in the prefix. Applying eqs. (3) and (4) to eq. (2),

With these decompositions we are now equipped to prove theorem 5.1.

Theorem 5.1. In the basic model, it is optimal to query nodes in order of recency.

Proof. Fix T ∈ N, p T ∈ (0, 1], and β > 0. Let σ be the optimal priority ordering constructed via the dynamic program in section 5.2. It suffices to show that for all j ∈ {0, 1, . . . , T }, σ j = j.

Proof by induction on j. By eq. (5), 

Since the durations are also identically distributed, the expected pre-multipliers are equal. Call this premultiplier γ k and observe that it has does not depend on h.

Combining eqs. (3) and (7), the expected benefit of an (h, σ k−1 )-period is

Likewise, combining eqs. (4) and (8), the expected pre-multiplier is

By eq. (6), given the prefix (0, 1, . . . , k − 1), σ k is selected by comparing (h, σ k−1 )-periods for h ∈ {k, . . . , T }. Therefore for all h in consideration, the above identities hold. Then

To understand the final equality, note that e −βh decreases with h, and all other terms are constant with respect to h. Thus k, the smallest value of h in consideration, attains the maximum. Since σ k = k, for all j ∈ {0, 1, . . . , T }, σ j = j.

Thus, despite the fact that the optimal policy in the basic setting is simply querying nodes in order of recency, the proof of optimality requires balancing trade-offs between recency and benefit.

While in the basic model querying nodes in order of recency is optimal, this is not always the case in the univariate model. In the univariate model the probability of infection decays absolutely with time according to an exponential functional form. Examining trade-offs and thresholds. As in the basic model, there is a trade-off between querying a less recent node, which provides an opportunity to significantly expand the frontier, and querying a more recent node which, if infected, returns a larger immediate benefit. However, since the probability of infection decays with time, a more recent node has a lower probability of infection. As a result, querying a more recent node does not necessarily return a higher expected immediate benefit. Thus choosing a node to query involves a trade-off between a node's recency and probability of infection.

The calculation of this trade-off changes with α. If α = 0 the probability of infection is constant, so as in the basic model, querying nodes in order of recency is optimal. As α increases it eventually hits a threshold beyond which this policy is no longer optimal, at which point it is natural to wonder whether any structure remains. In fact, we show that the there is still structure to the optimal policy: the policy always queries either the most recent or least recent node available. We say that such a policy is defined by an interleaved priority ordering. Observe that many different priority orderings satisfy the interleaving property. For example, an ordering that prioritizes nodes by recency is interleaved, as is an ordering that prioritizes nodes by reverse recency. Once α = β, each node returns the same expected benefit, and we show that in this case any priority ordering is optimal. Once α > β, a less recent node both returns a higher expected immediate benefit and has more children in expectation, and we show that in this regime querying nodes in order of reverse recency is optimal.

Taken together, these results imply that the interleaving property holds for all instances of the univariate model. Theorem 6.1. In the univariate model there is an optimal policy defined by an interleaved priority ordering.

Proof. The main challenge is proving that the interleaving property holds for 0 < α < β, which we show in theorem 6.2. If α = 0, querying nodes in order of recency is optimal, by theorem 5.1. If α ≥ β, then querying nodes in order of reverse recency is optimal, as shown in theorems 6.4 and 6.5. In general, the proof techniques involve analyzing periods and follow a similar structure as the proof in theorem 5.1.

The optimal policy is constructed via the same dynamic program as defined in eqs. (5) and (6), except the probability of infection is now determined by the function p(h).

In any round k > 0, the prefix σ 0 , σ 1 , . . . , σ k−1 is fixed, and σ k is selected from the elements not already in the prefix.

We analyze the above construction via similar strategies as in the proof of theorem 5.1.

Theorem 6.2. In the univariate model, for 0 < α < β, there is an optimal policy defined by an interleaved priority ordering.

Proof. Fix T ∈ N, p T ∈ (0, 1], β > 0, and 0 < α < β. Let σ be the optimal priority ordering constructed via the dynamic program in section 6.1. It suffices to show that for all 0 ≤ j ≤ T , σ j ∈ {max i≥j σ i , min i≥j σ i }. Here 0 attains the maximum, since α < β. 

Likewise, since the duration is identically distributed, the expected pre-multipliers are equal. Call this pre-multiplier γ l , and observe that it does not depend on h.

Given the prefix σ 0 , σ 1 , . . . , σ k−1 , the element σ k is selected by comparing (h, σ k−1 )-periods for h not in the prefix, that is, for h ∈ {l, . . . , m}. Therefore for all h in consideration eqs. (11) and (12) hold. Applying these identities and the definition of p(h) to eq. (10),

Then σ k = arg max h∈{l,...,m}

By lemma 6.3 I l is convex. This implies that the maximum value of I l on any closed interval is attained at an endpoint. Therefore 

As a result, σ k ∈ {l, m}, so σ k ∈ {min i≥k σ i , max i≥k σ i }. Therefore σ is interleaved.

The following lemma completes the above proof. Lemma 6.3. I l is convex.

Proof. Fix T ∈ N, p T ∈ (0, 1], β > 0, and 0 < α < β. Fix b l , γ l ≥ 0. It suffices to show that I l is non-decreasing.

. Given these parameters, we can rewrite I l as

Observe that C 0 , C 2 , C 3 > 0 and C 1 ≥ 0. Examining I l (x), note that the third term is positive and has no dependence on x. Since 0 < α < β, the first two terms are negative and both decrease in absolute value as x increases. Thus I l (x) is non-decreasing, and therefore I l is convex.

In the special case when α = β, the order in which nodes are queried has no effect on the total expected benefit, and any policy is optimal. Theorem 6.4. If α = β, any policy is optimal.

Proof. Fix T ∈ N, p T ∈ (0, 1], and α = β > 0. It suffices to show that, starting from an arbitrary state S, any arbitrary policies P 1 and P 2 achieve the same total expected benefit.

Let b(S, P 1 ) and b(S, P 2 ) be random variables indicating the total discounted benefit accumulated by running P 1 and P 2 , respectively, from the initial state S. Since α = β, at any step t the expected discounted benefit of querying any node in the frontier is

As a result, a policy that queries a node at step t receives an expected discounted benefit of p T e −αT −βt . Thus the total expected benefit a policy achieves is defined by the first step in which the frontier is empty, since from then on no nodes are queried. Let τ 1 and τ 2 be random variables indicating the first step in which the frontier is empty for P 1 and P 2 , respectively, starting from state S. Then

The order in which nodes are queried does not affect the first step in which the frontier is empty, so

Once α > β, a less recent node has both a higher expected immediate benefit and more children in expectation, so querying nodes in order of reverse recency is optimal. Theorem 6.5. If α > β, it is optimal to query nodes in order of reverse recency.

Proof. Let σ be the optimal priority ordering constructed via the dynamic program in section 6.1. It suffices to show that for all 0 ≤ j ≤ T , σ j = T − j. Here T attains the maximum, since α > β.

Fix 0 < k < T . Assume that that for all 0 ≤ j ≤ k − 1, σ j = T − j. As a result, (σ 0 , σ 1 , . . . , σ k−1 ) = (T, T − 1, . . . , T − k + 1) Given this prefix, σ k is selected by comparing (h, σ k−1 )-periods for h not in the prefix, that is, for h ∈ {0, 1, . . . , T −k}. Fix h ∈ {0, 1, . . . , T −k}, and let v be a node with recency h(v) = h. Recall that an (h, σ k−1 )period consists of an (h, ∅)-period followed by a (Z(h), σ k−1 )-epoch, for Z(h) = (Z 0 , Z 1 , . . . , Z h−1 ) ∼ D h . Since a node is less recent than its descendants, any descendant v of v has recency h(v ) < h ≤ T − k, so the epoch only involves nodes with recencies in {0, 1, . . . , T − k}. However, the prefix dictates that any node u queried during the epoch has recency h(u) > T − k. Thus no node is queried during the epoch, so for this particular prefix an (h, σ k−1 )-period is equivalent to an (h, ∅)-period. Therefore σ k = arg max h∈{0,1,...,T −k}

Here T − k attains the maximum, because α > β. Thus for all 0 ≤ j ≤ T , σ j = T − j, so the optimal policy queries nodes in order of reverse recency.

In the bivariate model the probability that a node is infected decays relative to the incubation period of its parent. Modeling this requires keeping track of two parameters, a node's recency and the recency of its parent. We say that a node of recency h with a parent of recency h has span ∆ = h − h. The probability that a node of span ∆ is infected is p(∆) = p T e −α∆ . Aside from the probability of infection, the bivariate model is identical to the basic model, so a node's recency h determines the distribution on its children and the benefit returned if it is found to be infected. A node of recency h and span ∆ is defined by its type Our main result in this section is a monotonicity property that shows that it is optimal to query nodes of the same recency in order of increasing span.

Theorem 7.1. In the bivariate model, there is an optimal policy that queries nodes with the same recency in order of increasing span.

To prove this result, we analyze an optimal priority ordering using many of the same techniques developed in sections 5 and 6. The proof leverages the fact that nodes of the same recency have the same distribution on descendants in order to optimize over span.

As a complement to the main result, we also examine monotonicity along the dimension of recency. Since nodes of different recencies have no children of the same type, comparing periods becomes tricky, and the inductive approaches used in sections 5 and 6 do not seem to apply here. Instead we construct the optimal ordering of types step-by-step. To make this approach tractable, our result is restricted to settings where the contact distribution D is a Bernoulli distribution and where all nodes have recencies in {0, 1, 2}.

Theorem 7.2. In the bivariate model, for any Bernoulli distribution D, a large enough constant β > 0, and restricted to types with recencies in {0, 1, 2}, it is optimal to query nodes of the same span in order of recency.

While the other restrictions are due to our particular approach, a lower bound on β is in fact necessary; there are settings of β for which querying nodes of the same span in order of recency is not optimal. Full proofs of the above theorems are in section 10.

The general model provides a broad framework that allows for a variety of factors to affect how individuals interact and how the infection spreads. In the general model each node is assigned an arbitrary type, which is associated with an arbitrary probability of infection and an arbitrary distribution on descendants. An individual's type could be thought of as representing information like their profession or role within a community and the context in which they are exposed. Thus the general model has the flexibility to describe the categorizations of contacts we see in practice, where the priority assigned to a contact depends on multiple different factors. The general model also encompasses the basic, univariate, and bivariate models.

We show in theorem 8.2 that the general model reduces to the branching bandit model. This has two main implications. First, it formally defines a connection between contact tracing and the brancing bandit model. Second, it implies that the optimal policy for any instance of the general model can be found by analyzing the optimal policy for a corresponding instance of the branching bandit model, as constructed by the dynamic program in section 5.1. We show in theorem 8.1 that, as a result, any instance of the general model has an optimal policy which is an efficiently computable index policy on the set of types with an efficient construction.

Since the basic, univariate, and bivariate models are all instances of the general model, this result applies to these settings as well, and it defines the construction of optimal policies in each of the three models.

The general model follows the same structure as the model from section 3, however the changes to the contact and infection processes warrant a second overview. Step t = 0. On step t = 0 the contact and infection processes halt and from then on no new infections occur.

To understand the system on step t = 0, consider an individual v from category c(v) = c. An individual in category c has recency h(c), so v was exposed in step −h(c). If v's exposure results in infection, for each step −h(c) + 1 ≤ t ≤ 0 in the remainder of Phase 1, v exposes (Z 1 (t), . . . , Z C (t)) ∼ D c,1,t individuals, where Z j (t) indicates the number of individuals with category j exposed on step t. Then by the end of step t = 0, throughout the course of the first phase v has exposed a multiset of contacts Z(c) = (Z 1 , . . . ,

. Let D c be the distribution on Z(c). As in section 3, we can view v as a node in a tree of potential exposures, where the children of v are all the contacts v met after being exposed. Let u be the node which exposed v. Then with probability p((c(u), c(v)) node v is infected and has children with categories (Z 1 , . . . , Z C ) ∼ D c .

Phase 2 Phase 2 begins on step t = 0 and continues indefinitely. During Phase 2 an individual's infection status is fixed but hidden. A contact tracer seeks to identify infected individuals as efficiently as possible.

As in section 3, contact tracing begins on step t = 0 when a set of index cases are identified. Initally the index cases are the only nodes available to query, and the tracer observes the category of each index case.

From then on, at each step t ≥ 0 the tracer selects one node from the frontier to query. Querying a node reveals its infection status, and if they are infected, its contacts are included in the frontier. The tracer may only query a node that is an index case or the contact of an infected node already queried. Querying an infected node v with category c(v) = c at step t returns the benefit b(c, t) = b(c)e −βt , for some arbitrary contant b(c) ∈ [0, 1]. Now consider the information available to the tracer at each step. Any node v in the frontier has an infected parent u, where u is either the super-root or another node in the tree that was already queried. Thus the tracer knows both v's category c(v) and u's category c(u). As described above, these two parameters define the probability that v is infected, and if v is infected, the benefit of querying v and the distribution on children added to the frontier. That is, c(u) and c(v) fully define the distribution on outcomes that result from querying v. We say that v has type (c(u), c(v)) ∈ [C] 2 . As in section 7, we assume two nodes of the same type are indistinguishable until they are queried. Then the state S t is the multiset of types present in the frontier at time t, and at each step the tracer selects a node to query based on its type.

Defining the objective. On each step t ≥ 0, the tracer selects a node v t from the frontier to query where v t has category c(v t ). Since v t is in the frontier, it was exposed to the infection. Let 1(v t ) indicate whether v t is infected (1) or uninfected (0). If v t is infected, then benefit b(c(v t ))e −βt is returned. Thus the total benefit the tracer accumulates over the course of Phase 2 is

As in section 3, the objective is to develop a policy for querying nodes that maximizes the total expected benefit, where the expectation is taken over all realizations

Defining the general model from types. The model can be equivalently described in terms of types instead of categories, which helps simplify the proof that follows. Enumerate the set of all types [C] 2 as {1, 2, . . . , C 2 }. Let v be a node of type (c , c) ∈ [C] 2 , where (c , c) is indexed as type j ∈ {1, 2, . . . , C 2 } in the enumeration. We now define the probability of infection, benefit, and distribution on children associated with type j. If v is exposed, the probability that v is infected is p(j) = p(c , c). If v is infected, the benefit of querying v is b(j) = b(c) and v's children have categories (Z 1 , . . . , Z C ) ∼ D c , where Z k indicates the number of children from category k. Since c(v) = c, any child of v from category k has type (c, k), so Z k equivalently indicates the number of children of type l = (c, k). Let D j be the equivalent distribution on multisets of types in {1, 2, . . . , C 2 } so that for (Y 1 , . . . , Y C 2 ) ∼ D j , Y l indicates the number of children of type l. Then the state S t = (X 1 , . . . , X C 2 ) is the multiset of types present in the frontier at time t.

Our primary result shows that the optimal policy in the general model is an index policy on the set of types {1, 2, . . . , C 2 }. Theorem 8.1. For any instance of the general model, there is an optimal policy that is an index policy on the set of types. Moreover, this index policy has an efficient construction.

Proof. Proof via reduction to the branching bandit model. Recall that the branching bandit model involves arms belonging to classes {1, . . . , L}. When an arm of class i is pulled it yields a reward R(i) and is replaced by a set of new arms N i1 , . . . , N iL , where each class i has an arbitrary, known, joint distribution on the random variables R(i) and N i1 , . . . , N iL . Additionally, each class i is also associated with a random variable µ(i), where pulling an arm from class i occupies µ(i) steps, however for the purposes of this reduction we consider only a restricted model where each pull occupies exactly 1 step. At step t the system is defined by a vector n(t) = (N 1 , . . . , N L ), where N i is the total number of arms of class i available, and a reward received at step t is discounted by e −ηt for a fixed parameter η > 0. As described in section 5.1, the optimal policy in the branching bandit model is an index policy on the set of classes {1, . . . , L} with an efficient construction.

The plan for the reduction is to map each node to an arm. The idea is to map the benefit and new children returned by querying a node to the reward and new arms returned by pulling an arm. To do this, for each type in the contact tracing instance we construct a class in the branching bandit instance with an appropriate distribution on reward and new arms. Theorem 8.2 proves that, for a specific mapping from types to classes, the general model reduces to the branching bandit model. Informally, proving the reduction involves showing that querying a sequence of nodes maps to pulling a corresponding sequence of arms.

The claim follows from this reduction. Since types map to classes, at any step t the state of the frontier S t is represented by a vector of arms n(t). The optimal policy in the branching bandit model selects the next arm to pull from n(t), which by the reduction dictates the next node to query from the frontier. Thus the optimal policy in the branching bandit model defines the optimal policy in the general model. In particular, since the former is an index policy on the set of classes, the latter is an index policy on the set of types. Therefore the optimal policy in the general model is an index policy on the set of types with an efficient construction.

Before continuing with the reduction, we first describe what proving such a reduction requires. We start by defining the general model and the branching bandit model each as games where an agent chooses actions in order to transition between different states. (Here "state" is used in the general sense of an agent moving through different states in a game, not as a multiset of types.) Proving a reduction involves mapping a sequence of states and actions in the general model to a corresponding sequence of states and actions in the branching bandit model. In the general model, the tracer (the agent) chooses types (actions) and in the branching bandit model the agent chooses classes (actions). Specifically, in the general model the state at step t is a pair (S t , b t ) where b t is the total benefit accumulated through the start of step t. The tracer chooses a type j t ∈ S t , and in response a random benefit B(j t )e −βt is returned and a multiset of new children (Y 1 , . . . , Y C 2 ) ∼ D jt is added to the frontier. As a result, the tracer transitions to the state

In the branching bandit model, the state at step t is a pair (n(t), r t ) where r t is the total reward earned through the start of step t. The agent chooses a class i t ∈ n(t), and in response a random reward R(i t )e −ηt is received and a multiset of new arms (N it1 , . . . , N itL ) are added to the system. As a result, the agent transitions to the state (n(t + 1), r t+1 ) where n(t + 1) = (n(t) \ i t ) ∪ (N it1 , . . . , N itL ) and r t+1 = r t + R(i t )e −ηt .

Let (S 0 , b 0 ) be an arbitrary initial state, and let (S 0 , b 0 , j 0 ), . . . , (S t , b t , j t ) be a sequence of states and actions. The general model reduces to the branching bandit model if there is a mapping of states and actions such that the next-state distribution is preserved. Specifically, we need a mapping from types to classes so that a corresponding sequence (n(0), r 0 , i 0 ), . . . , (n(t ), r t , i t ) exists which fulfills the following criteria for all 0 ≤ t ≤ t :

(1) S t maps to n(t), b t = r t , and j t maps to i t .

(2) (S t+1 , b t+1 ) | (S t , b t , j t ) has the same distribution as (n(t + 1), r t+1 ) | (n(t), r t , i t ).

If we can establish both (1) and (2) then for any sequence of states and actions the distribution on outcomes after some number of steps k is the same in both models. The following theorem proves the reduction by constructing a mapping from types to classes that satisfies these criteria. Proof. Let L = C 2 and let η = β. For each type j ∈ {1, 2, . . . , C 2 } we define a class i = j. (Even though i = j, we use separate names to distinguish between the class and type.) By this mapping, for a state S t = (X 1 , . . . , X C 2 ) and vector N (t) = (N 1 , . . . , N L ), if (X 1 , . . . , X C 2 ) = (N 1 , . . . , N L ) then S t maps to N (t). First we define the mapping, and then we show the reduction.

The idea for the mapping is to construct a class i for each type j such that the joint distribution on reward and new arms associated with i maps to the joint distribution on benefit and new children associated with j.

Recall that the joint distribution on the benefit and multiset of children associated with type j is

While so far we have referred to D j as a distribution on multisets of types, D j is simply a distribution on multisets of elements in {1, 2, . . . , C 2 }. Since L = C 2 , it can also be viewed as a distribution on multisets of classes {1, 2, . . . , L}. We define class i by the joint distribution

By this construction, if i = j then (B(j), (Y j1 , . . . , Y jC 2 )) ∼ (R(i), (N i1 , . . . , N iL )).

For any t ≥ 0, let (S t , b t ) be an arbitrary state in the general model, and suppose the tracer chooses type j ∈ S t . Then b t+1 = b t + B(j)e −βt and S t+1 = (S t \ j) ∪ (Y j1 , . . . , Y jC 2 ). Let (n(t), r t ) = (S t , b t ) be the corresponding state in the branching bandit model, and let the agent choose the corresponding class i = j. Then r t+1 = r t + R(i)e −ηt and n(t

. . , N iL )), and combined with the fact that n(t) = S t , η = β, and

Therefore, given corresponding states (S t , b t ) = (n(t), r t ) and actions j t = i t , then

Since the mapping preserves this next-state transition, by induction there is a sequence of corresponding states and actions in the branching bandit model, so the reduction holds.

We present contact tracing as an algorithm design problem where the objective is to develop a policy that prioritizes contacts to trace. Through a clean connection to the branching bandit model [45] , we develop provably optimal policies for a variety of infection models. Analyzing the structure of these policies leads to qualitative insights about trade-offs in contact tracing applications. In a setting where the probability of transmission is constant, the policy reflects the prioritization of contacts by recency that we see in practice.

In the more complex models of infection we study, the structure of the optimal policy depends on the specific parameters of the model yet still exhibits clear structure. Finally, we conclude with a general model, which has the capacity to model arbitrary interactions between individuals based on factors like an individual's profession, role within a community, or their risk of infection.

There are many compelling questions to consider going forward. One interesting question is how to analyze optimal policies in a dynamic setting, where contact tracing proceeds while the infection is spreading. We are currently exploring this setting in ongoing work [30] . A key question here is how to choose the objective function. On the one hand, the algorithm ought to be rewarded for identifying infected cases. On the other hand, the algorithm ought to prevent the spread of new cases, which in a formal sense works against the goal of identifying infected cases. This makes defining the objective function somewhat subtle. Another interesting question to consider is backward contact tracing. In this paper we examined forward contact tracing, which traces any infections due to an individual, while in backward contact tracing the goal is to identify the source of the infection. Finally, it is an interesting question to ask how theoretical recommendations in general can be integrated into manual contact tracing. from section 5.2. The element σ 0 is assigned to the type with the highest expected immediate benefit.

Given the first k elements σ 0 , σ 1 , . . . , σ k−1 , σ k is selected by comparing ((h, ∆), σ k−1 )-periods across all types not already in the prefix.

Analyzing periods. Recall that a node with recency h has children Therefore,

Restating the optimal policy. We restate the optimal policy using the above identities.

Given the first k elements σ 0 , σ 1 , . . . , σ k−1 , σ k is selected from the elements not already in the prefix. 

Using similar techniques as in section 6, we present our main result for the bivariate section.

Theorem 7.1. In the bivariate model, there is an optimal policy that queries nodes with the same recency in order of increasing span.

Proof. Fix T ∈ N, p T ∈ (0, 1], and β > 0. Let σ be the optimal priority ordering constructed via the dynamic program described above. Fix h ∈ {0, 1, . . . , T }. It suffices to show that (h, 0) (h, 1) · · · (h, T ).

By eq. (20) σ 0 = arg max Fix any (h, ∆ 1 ) = (0, 0) and (h, ∆ 2 ) = (0, 0). Assume to the contrary that (h, ∆ 2 ) ≺ (h, ∆ 1 ). Then (h, ∆ 2 ) was selected prior to (h, ∆ 1 ) in the construction of σ. That is, there is some pair k, k with k < k such that σ k = (h, ∆ 2 ) and σ k = (h, ∆ 1 ). As a result, for the prefix σ 0 , . . . , σ k−1 ,

Given the prefix σ 0 , σ 1 , . . . , σ k−1 , define the expected benefit and pre-multiplier associated with a (Y (h),

Applying these identities to eqs. (18) and (19), for any ∆ ∈ {0, 1, . . . , T },

. For strictly positive constants

we can rewrite f h as

. Thus no such prefix σ 0 , σ 1 , . . . , σ k−1 exists, so (h, ∆ 1 ) (h, ∆ 2 ), and as a result (h, 0) (h, 1) · · · (h, T ).

In this section we examine ordering types with the same span. Recall that two nodes of different recencies have no children of the same type. This makes comparing periods and epochs tricky, and as a result, this Figure 3 : In comparing types with the same span ∆, we examine a version of the bivariate model where each individual meets at most one contact each day. A node of recency 0 has no potential children, a node of recency 1 has one potential child, and a node of recency 2 has two potential children, one with a potential child of its own. section uses fairly different techniques. In order to determine whether (h, ∆) ≺ (h , ∆ ), we analyze the construction of optimal policies step by step and examine which type appears first in the ordering. Our techniques require case-by-case analysis, so we restrict our analysis to the setting in which D is a Bernoulli distribution and the recencies of all nodes are limited to {0, 1, 2}.

Our main result shows that, under these restrictions, it is optimal to query nodes of the same span in order of recency. Overview of techniques. Recall that a node of type (h, ∆) has children defined by the multiset Y (h) = (Y 0,h , Y 1,h−1 , . . . , Y h−1,1 ) ∼ D h , where Y j,i indicates the number of children with recency j and span i. For the following proofs D is a Bernoulli distribution, so Y j,i ∈ {0, 1}. We say that a node v with recency h(v) = h has potential children {(0, h), (1, h − 1), . . . , (h − 1, 1)}, since each type in the set could be a child of v. Let W (h, ∆) be the set of all potential descendants of a node of type (h, ∆). Figure 3 shows the potential descendants of different types we examine in this section.

Our strategy is to analyze the dynamic program from eqs. (16) and (17) . By definition, an optimal priority ordering σ sequences all types in {0, 1, . . . , T } 2 . However, determining whether (h, ∆) (h , ∆ ) does not necessarily require computing the entirety of σ. Instead, we only need to determine which of (h, ∆) and (h , ∆ ) falls earlier in the ordering σ. To do this, for any prefix σ 0 , σ 1 , . . . , σ k−1 , we need to be able to compare an ((h, ∆), σ k−1 )-period to an ((h , ∆ ), σ k−1 )-period. An ((h, ∆), σ k−1 )-period only involves the type (h, ∆) and its potential descendants. Therefore, to decide whether (h, ∆) (h , ∆ ), it is sufficient to compute an optimal ordering on only the two types in question, (h, ∆) and (h , ∆ ), and their potential descendants, W (h, ∆) and W (h , ∆ ). Specifically, we construct an ordering π on the set of types

such that if (h, ∆) (h , ∆ ) according to π, it follows that there is an optimal ordering σ on {0, 1, . . . , T } 2 such that π is a subsequence of σ and therefore (h, ∆) (h , ∆ ) according to σ.

To simplify notation, for any type (h, ∆) and prefix π 0 , π 1 , . . . , π k−1 , we define the index function ∆) , π k−1 )

Overloading notation, for any recency h ∈ {0, 1, . . . , T } we define the benefit function

The dynamic program is defined exactly as before, now with the index function notation. The highest priority element according to π is the type in S π with the highest expected immediate benefit.

Given a prefix π 0 , π 1 , . . . , π k−1 , π k is chosen from the remaining elements in S π .

π k = arg max Proof. Recall that our strategy is to construct an ordering π that is a subsequence of an optimal ordering σ. Here, since ∆ > 1, (0, 1) maximizes the index function. To select π 1 we compare I((0, ∆), π 0 ) and I ((1, ∆) , π 0 ). The type (0, ∆) has no potential children, so

The type (1, ∆) has potential child (0, 1). A ((1, ∆), π 0 )-period begins by querying a node of type (1, ∆) on step t = 0. With probability p(∆), the node is infected, and with probability c it has child (0, 1). So with probability cp(∆) the child (0, 1) is queried on step t = 1, otherwise the period ends on step t = 0. Therefore

Combining eqs. (22) and (23) we have

The above expression is strictly positive iff The following lemma analyzes an optimal priority ordering case by case to show that in all cases for the given regime, (1, ∆) (2, ∆). Proof. Recall that the proof idea is to construct an ordering π on a subset of types, where π is a subsequence of an optimal ordering σ on {0, 1, . . . , T } 2 . Specifically, let S π = {(0, 1), (1, 1), (0, 2), (1, ∆), (2, ∆)}. We will analyze a priority ordering π on S π . The cases for ∆ = 0 and ∆ = 1 are covered by lemma 10.3 and lemma 10.4, respectively. We analyze the case when ∆ > 1 in multiple parts, as follows.

Fix ∆ ∈ {2, . . . , T }. Then, as in lemmas 10.1, 10.3 and 10.4, π 0 = (0, 1). Next we examine π 1 . By theorem 7.1 (1, 1) (1, ∆), and by lemma 10.4 and theorem 7.1 (1, 1) (2, 1) (2, ∆). Therefore neither (1, ∆) nor (2, ∆) can be assigned to π 1 , leaving (1, 1) and (0, 2) as the only remaining candidates. We now examine both cases and show that regardless of whether π 1 = (1, 1) or π 1 = (0, 2), (1, ∆) (2, ∆).

Suppose π 1 = (0, 2). Consider selecting π 2 from the set of remaining types {(1, 1), (1, ∆), (2, ∆)}. Following the same argument as described above, (1, 1) precedes both (1, ∆) and (2, ∆), so π 2 = (1, 1). For the given prefix, by lemma 10.6 I((1, ∆), π 2 ) > I((2, ∆), π 2 ) so π 3 = (1, ∆), and therefore (1, ∆) (2, ∆).

Suppose π 1 = (1, 1). Consider selecting π 2 from the set of remaining types {(0, 2), (1, ∆), (2, ∆)}. By lemma 10.7 I((1, ∆), π 1 ) > I((2, ∆), π 1 ), so π 2 = (2, ∆). If π 2 = (1, ∆), then (1, ∆) (2, ∆). Suppose instead that π 2 = (0, 2). For the given prefix, by lemma 10.8 I ((1, ∆) , π 2 ) > I((2, ∆), π 2 ), so π 3 = (1, ∆). Thus with the given bound on β, in any optimal priority ordering (1, ∆) (2, ∆). Proof. Let S π = {(0, 1), (1, 0), (1, 1), (0, 2), (2, 0)}. Since (1, 0) and (1, 1) both have recency 1, by theorem 7.1 it is optimal to query (1, 0) before (1, 1). Therefore, there is an optimal ordering σ where (1, 0) (1, 1). Then for some pair k, k , with k > k, π k = (1, 0) and π k = (1, 1). Since (1, 1) / ∈ {π 0 , . . . , π k }, when computing I((2, 0), π j ) for any 0 ≤ j ≤ k the potential child (1, 1) is never queried. Therefore we can ignore the potential child (1, 1) when computing π, since it does not affect which of (1, 0) or (2, 0) appears first. Instead, for the purposes of this particular proof, we can think of (2, 0) as having a single potential child (0, 2). Thus we only need to choose an ordering on the types S π = {(0, 1), (1, 0), (0, 2), (2, 0)}.

Consider α ≥ β. Then the type (1, 0) (which recall has recency h = 1 and span ∆ = 0) is assigned to π 0 .

Thus if α ≥ β, (1, 0) (2, 0).

Consider α < β. Then

Given the prefix π 0 , we now compute π 1 . The possible candidates for π 1 are (1, 0), (2, 0) and (0, 2). Since neither (2, 0) nor (0, 2) has π 0 as a child, I((2, 0), π 0 ) = I((2, 0), ∅) and I((0, 2), ∅) = I((0, 2), ∅). Since α < β, I((0, 2), π 0 ) = I((0, 2), ∅) > I((2, 0), π 0 ) = I((2, 0), ∅)

Therefore π 1 = (2, 0), leaving (1, 0) and (0, 2) as the remaining candidates. If π 1 = (1, 0), then (1, 0) (2, 0), and we are done. Suppose π 1 = (0, 2). Now we compute π 2 by comparing I ((1, 0) , π 1 ) and I((2, 0), π 1 ). The type (1, 0) has potential child (0, 1) in the prefix and the type (2, 0) has potential child (0, 2) in the prefix. 

The final inequality comes from the fact that α, β > 0. Thus I ((1, 0) , π 1 ) > I((2, 0), π 1 ), so π 2 = (1, 0), and therefore (1, 0) ≺ (2, 0). The next lemma follows a similar proof structure, this time for ∆ = 1. To compute π 1 , we compare I((1, 1), π 0 ), I((2, 1), π 0 ), and I((0, 2), π 0 ). The type (2, 1) has no potential children in the prefix, so Comparing I ((1, 1) , π 0 ) and I((2, 1), π 0 ), we have I ((1, 1) , π 0 ) − I((2, 1), π 0 ) = p T e −α (e −β + cp T e −α−β )

p T e −α−β 1 − e −β · 1 + cp T e −α 1 + cp T e −α−β − 1 1 + cp T e −α > 0

The first inequality is due to the lower bound on β. Thus I ((1, 1) , π 0 ) > I((2, 1), π 0 ), so π 1 = (2, 1), leaving (1, 1) and (0, 2) as the only remaining candidates for π 1 . If π 1 = (1, 1), then (1, 1) ≺ (2, 1), and we are done. Consider the case that π 1 = (0, 2), and examine π 2 . The two remaining candidates are (1, 1) and (2, 1). Since π 0 = (0, 1) and π 1 = (0, 2), the set up is identical to the conclusion of lemma 10.3, substituting ∆ = 1 for ∆ = 0. Applying this substitution to eq. (24), I((1, 1), π 1 ) − I((2, 1), π 1 ) = p(1)(b(1) + cp (1) 

> 0

The final inequality comes from the fact that α, β > 0. Thus I((1, 1), π 1 ) > I((2, 1), π 1 ), so π 2 = (1, 1), which implies that (1, 1) ≺ (2, 1).

In the remaining lemmas, we examine cases where ∆ > 1. The following lemma simplifies many of the proofs that follow, by showing that in some regimes, it suffices to compare expected benefits. Let π be an ordering on S π that is a subsequence of an optimal priority ordering. Fix a prefix π 0 , π 1 , . . . , π k . If the type (2, ∆) has at least one potential child in the prefix, then γ((2, ∆), π k ) ≤ γ((1, ∆), π k ).

Proof. By definition, for any type (h, ∆) and a prefix π 0 , π 1 , . . . , π k , γ((h, ∆), π k ) = 

The duration τ ((h, ∆), π k ) is strictly greater than 1 if and only if the node of type (h, ∆) is infected and has at least one realized child in the prefix. By definition, a node of type (h, ∆) is infected with probability p(∆).

Since each potential child is realized with probability c, if the type (h, ∆) has l potential children in the prefix, a node of type (h, ∆) has at least one realized child in the prefix with probability 1 − (1 − c) l . Therefore

The type (1, ∆) has only one potential descendant, the child (0, 1), so a ((1, ∆), π k )-period has duration at most 2. 

Consider the type (2, ∆). By eq. (26), γ((2, ∆), π k ) ≤ Pr[τ ((2, ∆), π k ) = 1]e −β + Pr[τ ((2, ∆), π k ) > 1]e −2β (30) Since the type (2, ∆) has at least one potential child in the prefix, by eq. (27) Let π be an ordering on S π that is a subsequence of an optimal priority ordering. Suppose π 0 = (0, 1), π 1 = (0, 2), and π 2 = (1, 1). Then π 3 = (1, ∆).

Proof. To determine π 3 , we compute the index function for the only remaining types, (1, ∆) and (2, ∆) . Note that all the potential descendants of types (1, ∆) and (2, ∆) are in the prefix. By lemma 10.5, since the type (2, ∆) has at least one potential child in the prefix, γ((2, ∆), π 2 ) ≤ γ((1, ∆), π 2 ). Therefore 

To compute E[b((1, ∆), π 2 )], we examine the ((1, ∆), π 2 )-period. The period begins by querying the node of type (1, ∆) on step t = 0. With probability p(∆) the node is infected, and with probability c it has child (0, 1). So with probability p(∆)c the node of type (0, 1) is queryied on step t = 1. Therefore

We follow a similar process to compute E[b((2, ∆), π 2 )]. The ((2, ∆), π 2 )-period begins by querying the node of type (2, ∆) on step t = 0, which is infected with probability p(∆). Following the priority ordering defined by the prefix, if all potential descendants are realized, then the child of type (0, 2) is queried on step t = 1, followed by the child of type (1, 1) on step t = 2; if the child of type (1, 1) is infected, then its child of type (0, 1) is queried on step t = 3. Accounting for the fact that each potential descendant is realized independently with probability c, 

A survey of covid-19 contact tracing apps

Delay in diagnosis of influenza a (h1n1) pdm09 virus infection in critically ill patients and impact on clinical outcome

Contact tracing to control infectious disease: when enough is enough. Health care management science

Who do you know? a simulation study of infectious disease control through contact tracing

Guidelines for the investigation of contacts of persons with infectious tuberculosis. recommendations from the national tuberculosis controllers association and cdc

Pandora's box problem with order constraints

Contact tracing and disease control

Contact tracing strategies in heterogeneous populations

Prioritizing case investigations and contact tracing for covid-19 in high burden jurisdictions

World Health Organization Regional Office for the Eastern Mediterranean. Diagnostic and treatment delay in tuberculosis

Factors that make an infectious disease outbreak controllable

Multi-armed bandit allocation indices

J Mark FitzGerald, and Canadian Collaborative Group in Nosocomial Transmission of Tuberculosis. Delay in diagnosis among hospitalized patients with active tuberculosis-predictors and outcomes

Effects of early versus delayed initiation of antiretroviral treatment on clinical outcomes of hiv-1 infection: results from the phase 3 hptn 052 randomised controlled trial. The Lancet infectious diseases

Algorithms and adaptivity gaps for stochastic probing

Feasibility of controlling covid-19 outbreaks by isolation of cases and contacts

Gonorrhea modeling: a comparison of control methods

Emergency response to a smallpox attack: the case for mass vaccination

Analyzing bioterror response logistics: the case of smallpox

Efficacy of contact tracing for the containment of the 2019 novel coronavirus (covid-19)

The effectiveness of contact tracing in emerging epidemics

Delayed access to hiv diagnosis and care: Special concerns for the southern united states

Impact of delays on effectiveness of contact tracing strategies for covid-19: a modelling study

Epidemic models of contact tracing: systematic review of transmission studies of severe acute respiratory syndrome and middle east respiratory syndrome

Bandit algorithms

To quarantine, or not to quarantine: A theoretical framework for disease control via contact tracing

High rate of transmission of tuberculosis in an office: impact of delayed diagnosis

Impact of late diagnosis and treatment on life expectancy in people with hiv-1: Uk collaborative hiv cohort (uk chic) study

Contact tracing prioritization

Contact tracing-old models and new challenges

Contact tracing in stochastic and deterministic epidemic models

Projected life expectancy of people with hiv according to timing of diagnosis

Recommendations for partner services programs for hiv infection, syphilis, gonorrhea, and chlamydial infection

Establishing best practices in a response to an hiv cluster: an example from a surge response in west virginia

Infectivity of patients with pulmonary tuberculosis in inner city homes

The minimum latency problem is np-hard for weighted trees

Introduction to multi-armed bandits

Delayed diagnosis of hiv infection in a multicenter cohort: prevalence, risk factors, response to haart and impact on mortality

Covid-19 case investigation and contact tracing efforts from health departments-united states

Evaluating the effectiveness of contact tracing on tuberculosis outcomes in saskatchewan using individual-based modeling. Health education & behavior

Impact of contact tracing on covid-19 mortality: An impact evaluation using surveillance data from colombia

Determining the need for a contact investigation and prioritizing public health response

Branching bandit processes

The above expression is strictly positive iffThus the inequality holds due to the bound on β, so E[r((1, ∆), π 2 )] − E[r((2, ∆), π 2 )] > 0. Therefore by eq. (32), I((1, ∆), π 2 ) > I((2, ∆), π 2 ), so π 3 = (1, ∆). Let π be an ordering on S π that is a subsequence of an optimal priority ordering. Suppose π 0 = (0, 1) and π 1 = (1, 1). Then I((1, ∆), π 1 ) > I((2, ∆), π 1 ).Proof. Since the type (2, ∆) has at least one potential child in the prefix, γ((2, ∆), π 1 ) ≤ γ((1, ∆), π 1 ). ThereforeThe type (1, ∆) has one potential child (0, 1) in the prefix and no other descendants, soTo compute E[b((2, ∆), π 2 )] we examine the ((2, ∆), π 2 )-period. The period begins by querying the node of type (2, ∆) on step t = 0. With probability p(∆) the node is infected, and with probability c it has child (1, 1), which is then queried on step t = 1. With probability p(∆) the node (1, 1) is infected, and with probability c it has child (0, 1), which is then queried on step t = 2. ThusCombining eq. (42) and eq. (43),The above expression is strictly positive iffThe inequality holds due to the bound on β, so E[b((1, ∆), π 1 )] − E[b((2, ∆), π 1 )] > 0. Therefore by eq. (41), I((1, ∆), π 1 ) > I((2, ∆), π 1 ).The following lemma is nearly identical to lemma 10.6 and references the prior lemma for many computations.The key difference between the two settings is that in the prefix that lemma 10.8 considers, (1, 1) ≺ (0, 2), whereas the opposite holds in lemma 10.6. This difference in prefix changes the computation of the index function for (2, ∆).Let π be an ordering on S π that is a subsequence of an optimal priority ordering. Suppose π 0 = (0, 1), π 1 = (1, 1), and π 2 = (0, 2). Then π 3 = (1, ∆).Proof. To select π 3 , we compute the index function for the two remaining types, (1, ∆) and (2, ∆). Following the same argument as in lemma 10.6, this reduces to comparing expected benefits over an ((h, ∆), π 2 )-period. ThereforeJust as in lemma 10.6, the prefix includes (0, 1), so the computation of E[b((1, ∆), π 2 )] is also the same.To compute E[b((2, ∆), π 2 )] we examine the ((2, ∆), π 2 )-period. The period begins by querying the node of type (2, ∆) on step t = 0, which is infected with probability p(∆). Following the priority ordering defined by the prefix, if all potential descendants are realized, then the child of type (1, 1) is queried on step t = 1; if it is infected, then the node of type (0, 1) is queried on step t = 2, followed by the child of type (0, 2) on step t = 3. If the child of type (1, 1) is not infected, then the node of type (0, 1) is skipped, and the child of type (0, 2) is queried on step t = 2. Accounting for the fact that each potential descendant is realized independently with probability c,