key: cord-0639466-dqhm7zsm
authors: Fekom, Mathilde; Vayatis, Nicolas; Kalogeratos, Argyris
title: Dynamic Epidemic Control via Sequential Resource Allocation
date: 2020-06-11
journal: nan
DOI: nan
sha: 8d8e08b3bc6f56a0bc2185b7bb6f7c4fe8605044
doc_id: 639466
cord_uid: dqhm7zsm

In the Dynamic Resource Allocation (DRA) problem, an administrator has to allocate a limited amount of resources to the nodes of a network in order to reduce a diffusion process (DP) (e.g. an epidemic). In this paper we propose a multi-round dynamic control framework, which we realize through two derived models: the Restricted and the Sequential DRA (RDRA, SDRA), that allows for restricted information and access to the entire network, contrary to standard full-information and full-access DRA models. At each intervention round, the administrator has only access -- simultaneous for the former, sequential for the latter -- to a fraction of the network nodes. This sequential aspect in the decision process offers a completely new perspective to the dynamic DP control, making this work the first to cast the dynamic control problem as a series of sequential selection problems. Through in-depth SIS epidemic simulations we compare the performance of our multi-round approach with other resource allocation strategies and several sequential selection algorithms on both generated, and real-data networks. The results provide evidence about the efficiency and applicability of the proposed framework for real-life problems.

Compartmental models have gained particular attention in recent years due to their simple analytic formulations that can model modern problems related to information diffusion and social epidemics, e.g. rumor spreading [14] , [16] and other social contagions [13] . Controlling efficiently undesired diffusion processes (DPs) is crucial for public security and health, as was dramatically illustrated also by the Covid-19 crisis [8] . Yet, it is a difficult problem that in fact gets instantly much more complicated the moment one starts including more realistic constraints or objectives. A source of limitations is the theoretical interaction model to consider along with its networkwise abstraction (e.g. macro-vs microscopic modeling), which may be over-simplistic for the analyzed phenomenon. Another source of shortcomings is the level of required information regarding the system state, such as the infection state of nodes or the network connectivity. Finally, limitations come from the way a control model assumes it can intervene to the DP, e.g. in a fixed or dynamic fashion to the evolution of the process.

Dynamic models for allocating medical resources are subject to wide investigation [7] , [17] , for which [21] gives a convenient formalism with the introduction of the Dynamic Resource Allocation (DRA), a model for network control, originally developed for SIS-like processes [20] (a node is either infected, or healthy without permanent immunity), that distributes a limited budget of available treatment resources on infected nodes in order to speed-up their recovery. The score-based DRA formulation introduces an elegant way for assessing, through a score value, the criticality of each node individually for the containment of the DP [21] , [22] . Then, the administrator only has to ensure that at each moment the resources will be spent on the infected nodes with the highest scores. Among the proposed options [24] , [23] , a simple yet efficient local score is the Largest Reduction in Infectious Edges (LRIE) [21] , which depends on the infection state of the neighbors, hence it needs to be updated regularly during the process. A second option, called priority planning [22] , computes offline a priority-order of the network nodes so that to have minimal max-cut. This order specifies a fixed global score for all nodes, which can then be used to perform DRA considering only the infected nodes each time. In [19] , scores are continuous (called control signals) and are derived for each node with the purpose of minimizing a loss function representing trade-off between the cost of a treatment and the cost incurred by an infected node.

The motivation of our work is to bring the score-based DRA modeling closer to reality. In real-life scenarios, authorities have access to limited information regarding the network state, and can reach a limited part of the population to apply control actions (e.g. deliver treatments). Even more importantly, the decision making process is essentially a sequence of time-sensitive decisions over choices that appear and remain available to the administrator only for short time, also with little or no margin of revocation. An intuitive paradigm to consider is how a healthcare unit works: patients arrive one-by-one seeking for care, and online decisions try to assign the limited available resources (e.g. medical experts, beds, treatments, intensive care units) to the most severe medical cases [5] , [15] , [12] . While dealing with the Covid-19 crisis, medical experts in many countries were in fact 'scoring' patients to prioritize hospitalization. Above all, the unprecedented stress induced by the pandemic to the healthcare systems did reveal several of its weaknesses, while also highlighted the lack of efficient support for sequential decision making. The subject of this work aims to fill methodological gaps exactly of this kind. We establish a link between the DRA problem and the sequential decision making literature, offering a completely new perceptive to dynamic DP control. Among the existing Sequential Selection Problems (SSP) that have been widely studied, the most wellknown is the Secretary Problem [11] . Our aim, however, is to propose a concordant match to the discussed control setting.

Concerning the technical contribution, we first present the Restricted DRA (RDRA) model, in which each time the administrator can decide the reallocation of the resources only among a random sample of currently reachable nodes, which is treated as a batch. We next propose the special case of Sequential DRA (SDRA) where the latter sample of nodes is provided with a random arrival order, forcing the administrator to decide for the resource reallocation sequentially, according to the characteristics of the incoming nodes. The major achievement of our modeling is that it manages to create a new playground 1 where SSP algorithms can be incorporated to the DP control and make control strategies more applicable in real conditions. The implementation of existing online algorithms, such as the hiring-above-the-mean [6] or even the more effective Cutoff-based Cost Minimization [10] , leads to SDRA strategies that manage to reduce a DP in a comparable fashion to the unrestricted DRA.

The interactions among a population of N individuals are modeled by a fixed network represented by a graph G(V, E) of |V| = N nodes and |E| = E edges. The graph structure is arbitrary; the reader might picture it as being a directed or non-directed, weighted or unweighted graph, etc. Each entry A ij of the graph's adjacency matrix A ∈ R N ×N , expresses the possible influence of node i on node j, s.t. A ij = 0 if node i is linked with an edge to node j and may have an impact on him, and vice-versa.

The graph hosts an agent-based diffusion process (DP) that spreads from one node to another. Conceptually, the sequential epidemic control framework that we present, can apply to arbitrary DPs (even non-compartmental diffusion models), provided that there is a means to assess the criticality of the graph elements (e.g. nodes) for reducing the epidemic process. To simplify the presentation, we use a setting for which such criticality assessment has been made possible in the literature. We suppose that the DP in place is a continuoustime Markov process [20] , so that at each time instance t ∈ R + there can be at most one event of node state change in the network. More specifically, we consider an SIS-like recurrent epidemic, where nodes are either healthy ('S': susceptible to infection), or infected ('I'). The infection spreads from any infected node to its reachable healthy neighbors. Nodes are equipped with self-recovery without ever achieving permanent immunity. The infection state of the network is denoted by X t = (X 1,t , ..., X N,t ) T ∈ {0, 1} N , s.t. X i,t = 1 if node i ∈ V is infected and 0 otherwise. In the rest of the paper,X i,t = 1−X i,t , and N I t = N i=1 X i,t stands for the number of infected nodes in the network at time t.

An administrator has the mission to reduce the DP by managing a fixed budget of b ∈ N * resources that help the receiving nodes leaning towards the healthy state. The resources are regarded as being reusable, non-storable through time, and non-cumulable at nodes (i.e. at most one on each node). Resources might serve as treatments, doctors, nurses, beds, etc. in an emergency service, in which allocation decisions have to be made on-the-go. At each time instance t ∈ R + , the Dynamic Resource Allocation (DRA) [21] , [22] dynamically determines the resource allocation vector R t = (R 1,t , ..., R N,t ) T ∈ {0, 1} N where R i,t = 1 if a treatment is allocated to node i at time t and 0 otherwise; subject to i R i,t = b. The node state transitions of the homogeneous SIS model under control are given by:

where β is the contribution of any edge to the infection rate from an infected towards a healthy node, δ is the self-recovery rate of each infected node, and ρ is the contribution of a received treatment (if R i,t = 1) to the node recovery rate. If we let λ i be the transition rate for node i in Eq. 1, then the mechanism generating DP events (infection or recovery) is a Poisson process and the probability distribution of the time intervals between events is p(t, λ) = λe −λt , ∀t ∈ R + , where λ = N i=1 λ i is the sum of all node transition rates. Then, each node i ∈ V can be the transitioning node with probability equal to λ i /λ. Using the formalism of [2] , we write as p vu (∆t) := P(N I t+∆t = v | N I t = u) the probability of going from u to v number of infected nodes in a time interval ∆t, and we get:

where b(u)

For continuous-time, ∆t → 0, and by using the Markov property P(N I t+∆t | N I 0 , ..., N I t ) = P(N I t+∆ |N I t ), the forward Kolmogorov differential equations are found for the probability of having n infected nodes at time t, denoted by p u (t):

for n = 1, 2, ..., N , and dp 0 /dt = p 1 d(1). This standard equation enables us to get the evolution of the stochastic variables of the problem. In particular, by multiplying by n and summing over n we find the equation of the evolution of the epidemic:

where E[X T R] = min(b, N I t ).

The score-based DRA assumes that there exists a scoring function s : V → R that returns a score S i,t ∈ R for each node i at time t, according to the mission, and the nodes with the highest scores are those to receive the resources. This class of strategies depends on the size of the available budget of resources and the efficiency of each of them, as well on the ability of the scoring function in assessing correctly the criticality of nodes. The evaluation of the performance of a score-based strategy at time horizon T ∈ R + is the expected area under the curve (AUC) of the percentage of infected nodes w.r.t. time:

By making this choice we acknowledge that the E[A N (T )] is more characteristic for the effect of a strategy than, for example, the expected extinction time, E[t exct. ].

The standard DRA strategies are build on a strong assumption whereby the administrator has always full information about the process and full access to the network, which is apparently far from being realistic in most practical cases. To reduce this distance, we introduce two models in the following sections. Their generality stems from the fact that they assume the scoring function s to be a 'black-box' (hence out of our main research focus here) that is appropriate for the studied network process. In this context, this means that it is efficient in determining the criticality of an infected node when asked. It then remains to the algorithmic part of a strategy to take the decisions of resource reallocations to nodes that would be as much as possible valued by the function s.

In the Restricted DRA (RDRA) model, only a fraction of nodes are reachable at each moment. Let I be the set of nodes for which we have information and C the set of accessible nodes, with I, C ⊆ V. We put forward two reasonable assumptions: A1) the accessible nodes are always included in the set of nodes for which we have information, i.e. C ⊆ I, and A2) at any time t ∈ R + , the treated nodes, C R t = {i ∈ V : R i,t = 1} are accessible. The first notion to define for specifying the RDRA model is that of a round. Definition 1. Round k is a discrete event of reviewing and revising the allocation of resources on the network. The series of rounds is defined by the sequence of time instances (t k ) ∈ R K + , where t 0 = 0, t K = T , and K is the total number of rounds.

A round-triggering process is considered to invoke the revision of the resource allocation. In this paper we restrict ourselves to a passive process, i.e. the (t k ) s are not decided or known to the administrator in advance. However, an active process can be an interesting addition to the RDRA strategy.

Generally, two subsequent rounds can be arbitrarily distant in time. Here, more specifically, a new round is triggered whenever there is a change in the infection state of the network, and thus, at most one node can recover or get infected between two subsequent rounds. Formally, this is described by the following condition on the allocation events:

Since rounds is essentially a measure of time, each variable can be defined round-wise, for instance we write X k for the infection state at round k, i.e. at the time instance t k .

is a DRA strategy parametrized by the number of resources b ∈ N * and a scoring function s : V → R. At any moment during a round k, the administrator has information about the nodes of the set I and simultaneous access to the nodes of the set C, in addition to those currently treated, C R k , more specifically:

The strategy outputs a resource allocation vector, i.e.

We consider as the default RDRA setting when

where C t is called sample and is defined below. Choosing to define C or I otherwise, results in special RDRA cases, while choosing I = C = V degenerates to standard DRA.

The probability of observing a sample c ⊂ V, given its size n ∈ N * and the network state

Typically, the number of accessible nodes is proportional to the number of infected nodes in the graph, i.e. n k = α i X i,k , α ∈ [0, 1]. Take as examples these two simple sampling functions, whereas more complicated ones can be considered:

the sampling is uniform among the population of infected nodes;

i.e. nodes with high scores have a highest probability to be sampled.

Algorithm 1 DP control with Restricted and Sequential DRA Input: N : population size; b: budget of resources; X 0 : initial infection state; p(x): transition probability from state x to every other state; f : function that gives the number of accessible nodes; Λ: p.d.f. of the sample; Π: Restricted DRA strategy; isSequential: specifies if the strategy is RDRA (false) or SDRA (true). Output: X: final network state, R: final allocation of the resources 1:

n ← f (X) // compute the number of accessible nodes 5:

C R ← find(R = 1) // currently treated nodes 6:

C ∼ Λ(n, X) // generate a node sample 7:

if isSequential == true then 8:

for j = 1...n do // loop of a selection round 9:

R ← Π(C R , C j ) // update resource allocation sequentially 10:

end for 11: else 12:

R ← Π(C R , C) // update resource allocation altogether 13:

end if 14:

X ← p(X) // update infection state 15: end while 16: return X, R The RDRA's assumption for simultaneous access to all the nodes of a sample in a round remains far from being realistic. We next refine the access constraints and present a model processes sequentially the sample.

The Sequential DRA (SDRA) model does not reassign altogether the b treatments as DRA and RDRA strategies do. Here, the round is further divided into n k time intervals and the reallocation is performed sequentially at the time-scale of the round duration. In each time interval one candidate of the sample is examined for getting a treatment. Let the discrete index j ∈ {1, ..., n k } characterize the sequential arrival order of candidates, e.g. j = 1 and j = n k are respectively the first and last candidates of round k. Since the administrator gains access sequentially to candidates, the problem variables can depend on the index j ∈ {1, ..., n k }.

.., C j ) and C j = C j , ∀j ≤ n k , providing a uniformly random arrival order of the nodes of the sample.

The way the RDRA and SDRA models operate is described in Alg. 1, and an deployment example is depicted in Fig. 2 .

To the best of our knowledge, this work is the first to cast the dynamic DP control as a problem where decisions are taken as in a Sequential Selection Problem (SSP). Our goal is to establish this link in the best and most comprehensible way, hence it is beyond our aims to propose a new SSP setting. Features to consider before chosing an SSP settings are to have, for instance, single or multiple resources, finite or infinite horizon, score-based or rank-based objective function, etc.

Most SSPs consider a cold-starting selection where the administrator begins with an empty selection set. Contrary, in the SDRA setting and the epidemic control, the resources are supposed to be constantly in effect in the network and, hence, when a selection round starts, there is a warm-start where a set of currently treated nodes are in fact already 'selected'. It turns out that this is the biggest difficulty in finding methods in the existing SSP literature matching the SDRA setting. Nevertheless, a warm-starting SSP variant that fits well to the sequential epidemic control has been presented in [10] .

We map the problem of DP control with SDRA to a succession of separate Warm-starting Sequential Selection Problem s (WSSP s) [10] . Specifically, one selection round of the former corresponds to one instance of the latter. For convenience, in our notations we drop the round subscript k within each WSSP, e.g. C k becomes C.

Definition 5. The Warm-starting Sequential Selection Problem (WSSP) is an SSP variant described by elements of different categories:

vector of the entire population, • n ∈ N * : number of candidates to come. b. Initialization

the subset of the population, called preselection, to which resources are initially allocated when a round begins, i.e. R C R i = 1, ∀i ≤ b.

• (C 1 , ..., C n ) ∈ P n (V\C R ): sequence of randomly incoming candidates for receiving a resource, where P l (E) denotes the set of l-combinations of some finite set E, • (R c1 , ..., R Cn ) ∈ {0, 1} n : sequence of resource allocation decisions taken; giving a resource to a candidate immediately withdraws it from a preselected node (recovered or not), i.e.

The cost function is defined as:

The first term of the cost function in Eq. 9 defines the highest achievable score, while the second one gives the score achieved by the taken sequential decisions. As the sequence of incoming candidates is a random variable, E[φ B /b] is the objective function to minimize.

Some observations have to be made concerning our specific SDRA-to-WSSP mapping: 1) It translates the objective of the DP control (i.e. to minimize the percentage of infected nodes through time) into an SSP objective (i.e. to minimize the expected cost function of the selected items), hence,

Infected node with score S, is examined and gets rejected at the j-th step of the sequence j S Currently unreachable node (not recently examined) Ranked preselected nodes with known scores, lying out of our view; they may be or may be not receiving a resource

Edges to out of view nodes each WSSP instance, the administrator does not need to know anything about the infection state of the network, and merely selects online.

3) The final selection of a round may not exactly constitute the preselection of the following one, specifically in case of recoveries of nodes that will not be any more part of the preselection. Their resources are returned and become available for the candidates of the round. 4) If the administrator has still unassigned resources while reaching the end of the sequence of a round (for the triggering function we considered, this can be at most one -see Remark 1 and below it), then they are by default given to the very last candidates to appear.

In Sec. IV-B we argued about plugging online algorithms from the SSP literature into the SDRA model to sequentially control DPs. Here we focus on the algorithmic part, in particular on two classes of online strategies:

• Cutoff-based: such a strategy takes as input a cutoff value c ∈ N; it first rejects by default the first c incoming candidates, in a learning phase, and then selects a candidate according to information collected during the first phase. • Threshold-based: a particular case of cutoff-based strategies with c = 0. A candidate is accepted if his score beats a specified acceptance threshold.

We chose an indicative algorithm from each class: the Cutoffbased Cost Minimization (CCM) [10] and the Hiring-Abovethe-Mean (MEAN) [6] , whose objectives are to minimize the expected sum of the ranks (or respectively, the expected sum of scores) of the selected nodes at the end of a round.

Cutoff-based Cost Minimization. CCM takes also as input a measure of quality q ∈]0, 1[ of the preselected nodes w.r.t. the sample. This measure indicates how worthy the current resource allocation is. A high quality suggests that the currently treated nodes are among the network's most critical nodes for the epidemic spreading. For this strategy, the administrator needs to value the selection in terms of ranks instead of scores. 

) ⊂ S be the scores of the candidates and S C = (s(c 1 ), ..., s(C n )) ⊂ S the scores of the preselected nodes. Using Definition 6, the rank-based cost function is defined as φ σ B,k = S∈S σ n+b (S, (S R , S C )), where S = {S j ∈ S|R j = 1} 1≤j≤n . In this paper, we set q k = φ σ B,k−1 /b, ∀k ≤ K, where φ B,0 = 0.5. This way, the quality of the preselected nodes in the k-th round is simply the sum of their ranks w.r.t. to the sample of the previous round, when they were selected. This implies our assumption that the item ranks are rather constant between two subsequent rounds. Then, the table c * (b, n, q) ∈ N b×n is computed by tracking the lowest point of the expected rank-based cost provided in [10] .

The algorithm proceeds as follows. The b-best scores recorded during the learning phase of size c * are stored ordered in a reference set. Then, the acceptance threshold starts at the worse score of the set and moves up each time a candidate is accepted, pointing at the next higher score of a non-resigned employee. The process terminates when the end of the sequence is reached. For simplicity, we refer to the CCM strategy with c = c * (b, n, q) as CCM * . Note that the rank-based evaluation is particularly suited for the DP control, as nodes' criticality scores are most likely changing through time.

Hiring-above-the-mean. MEAN strategy [6] considers the average score of the preselection as acceptance threshold, which is updated after each new selection. In the original setting, each incoming candidate j ≤ n has a quality score Q j ∼ U(0, 1); and the goal is to keep a good trade-off between the quality of the selection and speed of hires. Let the average quality of b + 1 employees be denoted as A b (it starts with A 0 ∈ [0, 1]). To quantify the rate of convergence of the latter and the rate at which candidates are hired, they define a gap G b = 1 − A b , which converges to 0 almost surely when b goes to infinity. Among other results, they proved that after b 3/2 candidate interviews, the expected value for the mean gap for the best b candidates is O(1/ √ b), which makes the strategy close to optimal given this particular evaluation criterion. Despite being more intuitive and easier to implement than CCM, this strategy reaches its limit when the preselection is of poor quality with respect to the sample (and probably also with all the population of care-seekers). We can also consider the strategy [6] , where the acceptance threshold used is their median score, MEDIAN.

V. OFFLINE VS. ONLINE In our DP context, a strategy is called offline when it deterministically selects the b-best reachable nodes and immediately assigns resources to them. As explained earlier, the notion of 'best' is given by the 'black box' scoring function s : V → R, which prioritizes nodes independently based on their criticality for the spread. On the other side, an online strategy can only examine the candidate nodes one-by-one (see Definition 4). Before going further, let us clarify that for simplicity we add the superscript 'off' to refer to the offline strategy associated to the online strategy Π k (C) that is implicitly used. Also, the offline strategy defined in this way is only an indicator of which nodes would have been selected by the oracle.

The main issue that arises from the link with SSP concerns the relationship between the selection performance of an online strategy and the expected number of infected nodes. This can be rather measured as a difference in the effect at the epidemic compared to the corresponding offline strategy, namely

The performance of an online selection strategy is quantified by its expected sum of false negatives (FN) and false positives (FP) in the sequence, hence we define the online selection error (or just error):

. e k is evaluated among the subsequent rounds, right after a round's selection is finalized. An online strategy with guarantee can therefore be defined by providing a bound on either: i) the expected number of errors over all rounds, s.t.:

ii) or the expected cost at any k-th round, φ B,k (see Eq. 9), written as φ k = S k · (R k off − R k ), s.t.:

This second bound is stronger as it is given round-wise. However, it requires a certain knowledge of the score distribution. Going back to the DP, a thorough numerical analysis lead us to the formulation of the following linear regression assumption. Assumption 1. The expected AUC of the difference in the percentage of infected nodes between an online and an offline selection strategy is an affine function of the AUC of the online selection error, which is formulated as:

where c 1 ≥ 0 ≥ c 2 are constants that depend on the sampling size α (see Sec. III-A), and the epidemic parameters of the problem.

Suppose that an online strategy provides a bound on its expected number of errors over the rounds and validates Eq. 10. Under the Assumption 1, one can deduce the following bound on the AUC of the percentage of infected nodes:

Suppose now that an online strategy provides a bound on the cost at round k and validates Eq. 11. Observe that

where S min,k is the minimum score over all candidates of round k. Thus, Eq. 12 in Assumption 1 becomes:

A short investigation of the role of the constants c 1 and c 2 can be found in the simulations.

Example: online vs. offline selection - Fig. 3 displays an example of an online selection round where two resources are initially assigned to the nodes of the preselection (top-left in each subfigure). Suppose that the online strategy gives a resource to incoming nodes with score higher than the average score of the preselection, here with scores {0, −1} (the higher, the more critical they are). Scores of other infected nodes appear when their turn in the sequence arrives and each of them gets accessed. The first candidate (j = 1) is not selected since his score of −1 does not beat the threshold of 1 2 (−1 + 0) = −0.5. The second candidate (j = 2), though, gets the resource unit from the worse preselected node. The new score threshold to beat is set to 0. The process continues, up to the last candidate (j = n). The final resource allocation of an offline selection strategy is also shown (rightmost subfigure). Here, the cost function for the online case is 1 b φ B = (1 + 1) − (1 + 0) = 1, where the first term is the highest achievable average score of the selection (i.e. the offline score), and the second term is what the online strategy achieved. Regardless the scoring function, an efficient SDRA strategy (online) should be as close as possible to the associated RDRA strategy (offline) in terms of A N (T ) (see Eq. 7).

Network. The interactions among a population of N individuals are modeled by a fixed, symmetric (undirected), and unweighted network with adjacency matrix A ∈ {0, 1} N ×N where A ij = 1 only if nodes i and j are linked with an edge. The connectivity structure is generated according to a scale-free (SF), a smallworld (SW), a community-structured (CS) network model, or a real network of Facebook user-user friendships.

• In the SF type, the node degree distribution follows a power law, hence few nodes are hubs and have much more edges than the rest. We use the Barabási-Albert preferential attachment model [3] that starts with two connected nodes and, thereafter, connects each new node to m ∈ N * existing nodes, randomly chosen with probability equal to their normalized degree at that moment. • In the SW type, nodes are reachable to each other through short paths. We use the Watts-Strogatz model [25] : it starts by arranging the N nodes on a ring lattice, each connected to m ∈ N * neighbors, m/2 on each side. Then, with a fixed probability p ∈ [0, 1] for each edge, it decides to rewire it to a uniformly chosen node of the network. • In the CS type, nodes are grouped into sets that are densely connected internally and sparsely between groups. We use an hierarchical Erdös-Rényi model, where the probability p l ∈ [0, 1] for creating each edge reduces at each level l as we move up in the hierarchy. At the lowest level there are 12 groups of 100 nodes, N = 1200 nodes overall (see Fig. 14 ).

For the two first types of generated networks, we use a small population size of N = 100 individuals, which however is sufficient for our demonstration. Furthermore, by rescaling the epidemic parameters, the same phenomena can be reproduced in larger networks. The model parameters to generate each network, are mentioned explicitly in the associated figures. Scoring function. In this section, we briefly expose a series of known scoring functions that we used in our experiments:

• Random (RAND): selects nodes uniformly at random, among the infected nodes. • Largest Reduction in Spectral Radius (LRSR): selects each time the infected nodes that lead to the largest drop in the first eigenvalue of the network's adjacency matrix [24] . • Largest Reduction in Infectious Edges (LRIE) [21] : selects the infected nodes that minimize the number of infectious edges that connect an infected and a healthy node. The associated score is the difference between the number of healthy and infected neighbors of each node:

recall thatX i,t = 1 − X i,t , ∀j, t. This scoring function is derived from the minimization of the second order derivative of the expected number of infected nodes w.r.t. time:

dt 2 . LRIE is greedy and dynamic, since node scores change each time the infection state and/or the network structure changes. • Max-Cut Minimization (MCM) [22] : A priority planning strategy computes a healing plan which is an ordering of the nodes that accounts for their criticality w.r.t. the epidemic spread, and that is given prior to the beginning of the diffusion. It is thereby a static scoring function. The resources are given to the first b nodes of the plan, and once those nodes get cured, the resources are reallocated to the next nodes of the plan. The MCM finds a node priority-order with as low as possible max-cut (see Definition 16) . For a network with adjacency matrix A = (A ij ) ∈ {0, 1} N ×N , the max-cut of a given priority-order is defined as:

where (i) is the order of node i in the plan. Finally, the scoring functions of the infected nodes are given by:

i.e. priority is given to nodes at the beginning of the ordering. Using such a strategy, and under some specific assumptions, a bound over the expected extinction time can be retrieved, defined as t exct. = min(t, X t = 0) ∈ R + .

Example -In Fig. 4 is displayed the early evolution of the diffusion over the graph G starting with full infection, i.e. X 0 = (1, ..., 1) T . The administrator manages dynamically b = 2 resources (blue nodes). Following the LRIE strategy, node scores have negative initial values due to the full infection, which however increase as more nodes recover. We can see that the two highest LRIE scores are spread across the network, as in each score computation only local information is taken into account. On the other side, following the MCM strategy implies to first compute the priority-order = {E, D, B, C, A} that minimizes the max-cut, which is CU T ( ) = 3 (between nodes B and C) in this case, and provides fixed node scores.

Diffusion process and score-based DRA. The diffusion process we simulate in our experiments is as described in Sec. II-B and with a fixed budget of b = 5 resources. In this SIS formulation we have dropped the self-recovery (i.e. δ = 0 in Eq. 1) in order to emphasize the role of the decisions taken by the compared strategies. Concerning the scoring function, in the simulations we use both a static and a dynamic scoring function, namely MCM and LRIE.

In the empirical study, our aim is to compare the performance of several DRA strategies that follow Alg. 1. The offline selection strategy, which picks the reachable candidates with the highest LRIE (resp. MCM) scores, is always plotted with a dark blue (resp. light blue) curve as reference (see Figs. 5, 7, 8) . Other colors imply the sequential allocation of resources. At each round, a fraction α ∈ [0, 1] of the infected nodes n t = α i X i,t is uniformly sampled and become accessible to the administrator. 5 displays the evolution of the average percentage of infected nodes with time, using the compared DRA strategies on two network types. The SW type at the top row, where on the left appears the cutoff-based CCM strategy with various cutoffs, and on the right the MEAN and MEDIAN variations of the threshold-based strategy. In both subfigures, CCM * is clearly the best performing approach. Here, MEAN is a lot better than MEDIAN, however, on an SF network (bottom row), the curves appear to be closer together and they do not differ in performance. The CCM is still better, but CCM * shows no improvement over the use of the simpler cutoff c = √ n − 1. To further investigate the behavior of the strategies, we plot in Fig. 6 the score distribution f S (here from LRIE) for the infected nodes of the network at three different rounds in the course of a multi-round process. The top and bottom rows refer respectively to SW and SF networks. Fig. 6(a) and Fig. 6(c) show the f S obtained using a RDRA strategy (blue curves in Fig. 5) , while for Fig. 6(b) and Fig. 6 (d) the sequential strategy used is the CCM * (red curve in Fig. 5 ). Starting from nearly identical f S per row at k = 1 (same initial infection level), we observe that throughout the rounds the difference between the distributions of the RDRA and SDRA strategies is larger for SW networks. This is as expected since in that example the strategies have larger difference in performance. Moreover, for SF networks, the f S leans towards a Gaussian-like shape, which explains why MEAN and MEDIAN behave similarly, contrary to the more skewed shape obtained for a SW network.

It is easy to see in these simulations that the network structure plays a crucial role in the epidemic spread and sets the difficulty level to any strategy that tries to contain it. Also, the highly evolving shape of the score distribution throughout the process (rounds) illustrates the challenges that SDRA strategies need to address in order to be sufficiently effective.

C. RDRA with different scoring functions CS network. So far we have used the LRIE scoring function for the prioritization of infected nodes. Here, we compare the behavior of the RDRA strategy when using either one of the dynamic LRIE or the static MCM functions. In Fig. 7 , the percentage of infected nodes w.r.t. time is displayed for a CS network that exhibits a hierarchical community structure, see [1] . Due to the large variation of edge density that such networks exhibit, they usually require less resources than a graph with the same number of edges but without community structure; in this case b = 17 resources are enough for N = 1200 nodes. We notice that RDRA performs better with the MCM scoring function than with the LRIE, as it is more efficient at targeting the critical nodes. Fig. 7 shows the impact of the sampling size (left: α = 1; right: α = 0.05). Despite the significant difference between the two sampling sizes, the efficiency is only slightly reduced from one to the other. Overall, in practice our framework seems to be able to translate the quality of a scoring function to better performance in the constrained RDRA setting, and its applicability remains high even when the sampling size is quite small. SW and SF networks. In Fig. 8 , the simulations of Sec. VI-B are repeated, changing only the scoring function which now is MCM (RDRA appears in both sides for reference). The two scoring functions, LRIE and MCM, give similar results on SW and on SF networks. The noticeable difference concerns the SDRA strategy for which the sequential CCM * (red curve) is almost identical to the CCM strategy with c = √ n − 1 (green curve). In [4] , the latter strategy is shown to be optimal for node scores drawn from a uniform distribution (but unobserved by the administrator who only makes pairwise comparisons). In order to verify this assertion, we displayed in Fig. 9 the score distribution f S of the nodes w.r.t. the different rounds k. This clearly seems to be more uniform-like than the Gaussian-like shape obtained with LRIE, which explains well why the red and green curves are very close. Another observation is the fact that nodes with the highest scores are treated before scores with lowest scores (see orange bar plot), although some of them are still infected (the tail of the orange bar plot) since re-infection might occur at the beginning of the priority-order, especially in a graph without community structure. The results on a SF network are very similar to Fig. 5(c) and Fig. 5(d) and are therefore omitted due to space constraints.

Real networks. In the last simulations, in Fig. 10 , we use a real network that contains Facebook user-user friendships 2 , which 2 Available at: http://konect.uni-koblenz.de/networks/ is larger than the synthetic datasets used. A node is a user and an edge indicates a friendship between two users. Note that from [10] , using CCM with c = n/e is a decent alternative to CCM * when few or no node recovered; it is therefore used in this simulation (red curve). We deduce, from the positions of nodes on the circle of the layout, that any node is easily reachable through a small number of jumps from another node, which is characteristic of a SW network. For the simulations, we use the MCM scoring function, slightly better than the LRIE (see Appendix). Despite an initial number infected nodes of 20%, with only b = 16 resources for N = 2888 nodes the epidemic exploses when the allocation strategy is not proper, e.g. when using the MEDIAN strategy (purple curve).

In this section we investigate the linear regression hypothesis stipulated in Eq. 12 of Sec. V by simulating the epidemic spread for different sequential selection strategies. We then compare, for a fixed time horizon T , the AUC of the expected percentage of infected nodes, A ∆N (T ) = of them, i.e. α = 0.5 (see Fig. 11(a) ). As expected, the result of each strategy, i.e. a 2-d point, lies on a line with slope coefficient c 1 = 0.714 and intercept c 2 = −52.14; therefore we get A ∆N (T ) ≤ 0.714 · M K − 52.14, which empirically verifies Assumption 1 (see Tab. II in the Appendix for more examples). After only 5-6 rounds, the CCM * algorithm makes no more than 1.5 errors on average for a fixed number of candidates n max = αN = 50 and b = 5 resources, which gives M K ≤ 1.5 5 K = 0.3· 301 = 90.3; hence we get A ∆N (T ) ≤ 0.714 · 90.3 − 52.14 = 12.33 that should be compared with the maximum of AUC over 301 rounds, that is A MAX ∆N (T ) = 301. Therefore, the error ratio is less than A ∆N (T )

A MAX ∆N (T ) = 4.1% of the worse case scenario.

This study aimed towards bringing Dynamic Resource Allocation (DRA) strategies closer to meeting real-life constraints. We revisited their strong assumption that the administrator has full information and access to all network nodes, at any moment decisions takes place: anytime needed, she can instantaneously reallocate resources to any nodes indicated by a criticality scoring function. We significantly relaxed this assumption by first introducing the Restricted DRA model, where only a sample of nodes becomes accessible at each round of decisions. Inspired by the way decisions are taken while care-seekers arrive at a healthcare unit, we next proposed the Sequential DRA model that limits further the control strategy so as to have only sequential access to the sample of nodes at each round. This setting offers a completely new perspective for dynamic control: the administrator examines the nodes one-by-one and decides immediately and irrevocably whether to reallocate resources or not. This online problem is put in relation with recent works in the literature where efficient algorithms have been presented for the selection of items from a sequence for which little or no information is available in advance. Special mention should be made to the Multi-round Sequential Selection Process (MSSP) that has been found to be particularly fitting for handling this new setting. Finally, according to our simulations on SIS epidemics, where we compared the performance of several variants of the above DP control models, we conclude that the cutoff-based CCM * is a very promising approach for the setting of sequential DP control. 

As described, the sampling is performed on the infected nodes and so far we used an arbitrary fixed ratio α. To analyze the impact of the sampling size on the efficiency of the CCM * strategy, we plot in Fig. 12 the average percentage of infected nodes w.r.t. time for various sampling ratios. We observe that the SDRA is less sensitive to the sampling size on SF networks (right) than on SW networks (left). Furthermore, regardless the network structure, increasing the sampling size does not improve linearly the efficiency of the algorithm. In Fig. 13 is displayed the AUC of Eq. 7 for two different number of average neighborsk = {2, 10}, and on each figure, for three different types of networks. Clearly, the difference between the network types is more evident when the edge density of the network increases (right-hand side), and the AUC is smaller for the SW type. This can be explained by the fact that in this type of graphs, increasing the edge density merely reinforces the connectivity within the hubs, i.e. aggregate of nodes, and hence assigning resources to those critical nodes allows to reduce efficiently the spread. Note that, in this work we consider a passive sampling that is purely random and does not follow any strategy, however we might envision an adversarial sampling made by a malicious agent that adapts to the administrator's strategy. In this case, the worst case scenario plays an important role in the strategy's performance, using the CCM strategy for instance, it occurs when the b-best candidates are the first incoming, since they are very likely to be rejected by default.

Community-structured network. The community graph used for simulations in Fig. 7 is displayed in Fig. 14, showing a clear hierarchical structure with three levels of point density.

Real network. The simulations of Fig. 15 are similar to those of Fig. 10 but focusing on the LRIE instead of the MCM scoring function. On the left, RDRA strategies using both scoring functions are compared, while on the right a comparison of different sequential strategies using LRIE is presented. Contrary to Fig. 10 with MCM, the sequential strategies here fail to reduce the epidemic spread when using the LRIE scores. 

Hierarchical random graphs for networks with weighted edges and multiple edge attributes

An introduction to stochastic epidemic models

Emergence of scaling in random networks

A new secretary problem with rank-based selection and cardinal payoffs

Flexible bed allocations for hospital wards

The hiring problem and lake Wobegon strategies

A dynamic vaccination strategy to suppress the recurrent epidemic outbreaks

Fair allocation of scarce medical resources in the time of covid-19

Sequential dynamic resource allocation for epidemic control

The warm-starting sequential selection problem and its multi-round extension

Who solved the secretary problem? Statistical Science

Sequential and simultaneous decision making for optimizing health care resource flexibilities

Infectious disease modeling of social contagion in networks

Epidemiological modeling of news and rumors on twitter

The importance of human resources management in health care: a global context

Information diffusion and rumor spreading

Dynamic optimization model for allocating medical resources in epidemic controlling

Estimating variability in models for recurrent epidemics: Assessing the use of moment closure techniques

Stochastic optimal control of epidemic processes in networks

Virus spread in networks

A greedy approach for dynamic control of diffusion processes in networks

Suppressing epidemics in networks using priority planning

Suppressing epidemics with a limited amount of immunization units

Gelling, and melting, large graphs by edge manipulation

Emergence of scaling in random networks

Area Under the Curve of the percentage of infected nodes from t = 0 to t = T texct. extinction time i.e. smallest DP time for which X(texct) = 0 C R t ⊂ V the treated nodes at time t, called preselection k ≤ K round index s.t. t k is the time at which the k-th round occurs, and K is the total number of rounds Π(I, C) ∈ {0, 1} n control strategy for which the administrator has information on the nodes of the set I, and access to the nodes of the set C n ≤ N, C ⊂ V node sample of size n at time t s.t. c 1 is the first incoming candidate and Cn the last of a round Λ(c; n, x) ∈ [0, 1] probability of observing sample c of size n for a given infection state x φ ∈ R + cost function to minimize R off ∈ {0, 1} N allocation vector that minimizes the cost for full access and info on the sample i.e.scores of the treated nodes, and of the candidate nodes respectively ∆N I t ∈ R + difference between the expected number of infected nodes using an online strategy and the corresponding offline strategy e ∈ R online error, i.e. the expected sum of half false negatives (FN) and false positives (FP) compared to an offline strategy texct.first time at which every nodes are healthy α ∈ [0, 1] fraction of the infected nodes that compose the sample q ∈]0, 1[ quality of the preselection compared to the candidates using the CCM strategy c ∈ N integer that specifies when to stop rejecting candidates called cutoff using the CCM strategy 

As presented in the main text, the random variable of interest in the stochastic SIS epidemic model is the number of infected nodes at time t, N I t , more precisely its expectation E[N I t ] for which we attempt to derive a closed-form equation. Let us start by considering the following assumptions. 1) The graph is a random Erdös-Rényi (ER) where every node has approximately the same degree, i.e.k ≈ k i :2) The allocation of resources is coarse-grained, i.e. we apply the RAND strategy for which the b nodes to receive a treatment are chosen at random among the population of infected nodes. 3) When b > N I t , the few remaining infected nodes can 'accumulate' more than one resource to increase their recovery rate until N I t = 0. Note that those assumptions are usually not verified in real cases, but are taken as a simple starting point for the analysis. By respecting the above constraints, we multiply Eq. 5 by u, sum over n, and obtain:that finally leads to the evolution of the first-order moment:We now multiply Eq. 5 by n 2 , sum over n, and similarly obtain the dynamic equation for the second-order moment:It is easy to check that every moment equation depends on a higher order moment, and thus the closed-form equation for the evolution of the expected number of infected nodes cannot be computed, even in the simple coarse-grained allocation case. In order to get some results, a first option is to use a moment closure technique to approximate the highest-order term, as in [18] . The two most common approximations are the following: A second option is to bound the expected number of infected nodes by using the fact that E[(N I t ) 2 ] ≥ E[N I t ] 2 , and thus that the mean of the stochastic SIS epidemic model is less than that of the deterministic solution.