key: cord-0444124-okwjykf9 authors: Fotakis, Dimitris; Piliouras, Georgios; Skoulakis, Stratis title: Efficient Online Learning for Dynamic k-Clustering date: 2021-06-08 journal: nan DOI: nan sha: 07e5c6faa5e13938dc6742c52c4fb11a2418ed58 doc_id: 444124 cord_uid: okwjykf9 We study dynamic clustering problems from the perspective of online learning. We consider an online learning problem, called textit{Dynamic $k$-Clustering}, in which $k$ centers are maintained in a metric space over time (centers may change positions) such as a dynamically changing set of $r$ clients is served in the best possible way. The connection cost at round $t$ is given by the textit{$p$-norm} of the vector consisting of the distance of each client to its closest center at round $t$, for some $pgeq 1$ or $p = infty$. We present a textit{$Thetaleft( min(k,r) right)$-regret} polynomial-time online learning algorithm and show that, under some well-established computational complexity conjectures, textit{constant-regret} cannot be achieved in polynomial-time. In addition to the efficient solution of Dynamic $k$-Clustering, our work contributes to the long line of research on combinatorial online learning. Clustering problems are widely studied in Combinatorial Optimization literature due to their vast applications in Operational Research, Machine Learning, Data Science and Engineering [49, 39, 10, 6, 9, 31, 36, 51, 38, 11, 37, 3] . Typically a fixed number of centers must be placed in a metric space such that a set of clients is served the best possible way. The quality of a clustering solution is captured through the p-norm of the vector consisting of the distance of each client to its closest center, for some p ≥ 1 or p = ∞. For example k-median and k-means assume p = 1 and 2 respectively, while k-center assumes p = ∞ [39, 37, 3] . Today's access on vast data (that may be frequently updated over time) has motivated the study of clustering problems in case of time-evolving clients, which dynamically change positions over time [14, 19, 18, 4] . In time-evolving clustering problems, centers may also change position over time so as to better capture the clients' trajectories. For example, a city may want to reallocate the units performing rapid tests for Covid-19 so as to better serve neighborhoods with more cases, the distribution of which may substantially change from day to day. Other interesting applications of dynamic clustering include viral marketing, epidemiology, facility location (e.g. schools, hospitals), conference planning etc. [43, 18, 40, 41, 48] . Our work is motivated by the fact that in most settings of interest, clients can move in fairly complicated and unpredictable ways, and thus, an a-priori knowledge on such trajectories is heavily under question (most of the previous work assumes perfect knowledge on clients' positions over time [18, 4, 14, 19] ). To capture this lack of information we cast clustering problems under the perspective of online learning [23] . We study an online learning problem called Dynamic k-Clustering in which a learner selects at each round t, the positions of k centers trying to minimize the connection cost of some clients, the positions of which are unknown to the learner prior to the selection of the centers. Online Learning Problem 1 (Dynamic k-Clustering). Given a metric space d : V × V → R ≥0 . At each round t, 1. The learner selects a set F t ⊆ V , with |F t | = k, at which centers are placed. 2. The adversary selects the positions of the clients, denoted as R t (after the selection of the positions of the centers by the learner). 3. The learner suffers the connection cost of the clients, where d(j, F t ) is the distance of client j to the closest center, d(j, F t ) = min i∈Ft d ij . Based on the past positions of the clients R 1 , R 2 , . . . , R t−1 an online learning algorithm must select at each round t, a set of k centers F t ⊆ V such that the connection cost of the clients over time is close to the connection cost of the optimal (static) solution F * . If the cost of the online learning algorithm is at most α times the cost of F * , the algorithm is called α-regret, whereas in case α = 1, the algorithm is called no-regret [23] . Intuitively, a low-regret online learning algorithm converges to the optimal positions of the centers (with respect to the overall trajectories of the clients) by just observing the clients' dynamics. Example 1. The clients are randomly generated according to a time-varying uniform distribution with radius 0.3 and center following the periodic trajectory sin( 2π·t T ), cos( 2π·t T ) for t = 1, . . . , T . The centers placed by a (sufficiently) low-regret algorithm would converge to positions similar in structure to the ones illustrated in Figure 1 (for k = 1, 2, 4 and k = 8) which are clearly close to the optimal (static) solution for the different values of k. Efficient Online Learning for Dynamic k-Clustering. The existence of no-regret online learning algorithms for Dynamic k-Clustering immediately follows by standard results in online learning literature [23] . Dynamic k-Clustering is a special case of Learning from Expert Advice problem for which the famous Multiplicative Weights Update Algorithm achieves no-regret [23] . Unfortunately using the MWU for Dynamic k-Clustering is not really an option due to the huge time and space complexity that MWU requires. In particular MWU keeps a different weight (probability) for each of the possible |V | k possible placements of the k centers, rendering it inapplicable even for small values of |V | and k. Our work aims to shed light on the following question. Question 1. Is there an online learning algorithm for Dynamic k-Clustering that runs in polynomial time and achieves α-regret? Our Contribution and Techniques. We first show that constant regret cannot be achieved in polynomial time for Dynamic k-Clustering. In particular we prove that any O(1)-regret polynomial-time online learning algorithm for Dynamic k-Clustering implies the existence of an O(1)-approximation algorithm for the Minimum-p-Union problem [12] . Recent works on the theory of computational complexity establish that unless well-established cryptographic conjectures fail, there is no O(1)-approximation algorithm for Min-p-Union [12, 5, 13] . This result narrows the plausible regret bounds achievable in polynomial time, and reveals an interesting gap between Dynamic k-Clustering and its offline counterparts, which admit polynomial-time O(1)-approximation algorithms. Our main technical contribution consists of polynomial-time online learning algorithms for Dynamic k-Clustering with non trivial regret bounds. We present a Θ(k)-regret polynomialtime deterministic online learning algorithm and a Θ(r)-regret polynomial-time randomized online learning algorithm, where r is the maximum number of clients appearing in a single round (r = max 1≤t≤T |R t |). Combining these algorithms, one can achieve Θ (min(k, r))-regret for Dynamic k-Clustering, which (to the best of our knowledge) is the first guarantee on the regret achievable in polynomial time. The regret bounds above are independent of the selected p-norm, and hold for any p ≥ 1 and for p = ∞. At a technical level, our approach consists of two major steps. In the first step, we consider an online learning problem, that can be regarded as the fractional relaxation of the Dynamic k-Clustering (see Section 3), where the fractional connection cost is given by the optimal value of an appropriate convex program and the action space of the learner is the |V |-dimensional simplex. For this intermediate problem, we design a no-regret polynomial-time online learning algorithm through the use of the subgradients of the fractional connection cost. We show that such subgradient vectors can be computed in polynomial time via the solution of the dual program of the fractional connection cost. In the second step of our approach (see Section 4 and Section 5), we provide computationally efficient online (deterministic and randomized) rounding schemes converting a vector lying in the |V |-dimensional simplex (the action space of Fractional Dynamic k-Clustering) into k locations for the centers on the metric space V (the action space of Dynamic k-Clustering). In Section 4, we present a deterministic rounding scheme that, combined with the noregret algorithm for Fractional Dynamic k-Clustering, leads to a Θ(k)-regret polynomial-time deterministic online learning algorithm for the original Dynamic k-Clustering. Interestingly, this regret bound is approximately optimal for all deterministic algorithms. In Section 5, we show that combining the no-regret algorithm for Fractional Dynamic k-Clustering with a randomized rounding scheme proposed in [11] 1 leads to a Θ(r)-regret randomized algorithm running in polynomial time. Combining these two online learning algorithms, we obtain a Θ(min(k, r))-regret polynomial-time online learning algorithm for Dynamic k-Clustering, which is the main technical contribution of this work. Finally, in Section 6, we present the results of an experimental evaluation, indicating that for client locations generated in a variety of natural and practically relevant ways, the realized regret of the proposed algorithms is way smaller than Θ (min(k, r)). Our two-step approach provides a structured framework for designing polynomialtime low-regret algorithms in various combinatorial domains. The first step extends far beyond the context of Dynamic k-Clustering and provides a systematic approach to the design of polynomial-time no-regret online learning algorithms for the fractional relaxation of the combinatorial online learning problem of interest. Combining such no-regret algorithms with online rounding schemes, which convert fractional solutions into integral solutions of the original online learning problem, may lead to polynomial time low-regret algorithms for various combinatorial settings. Obviously, designing such rounding schemes is usually far from trivial, since the specific combinatorial structure of each specific problem must be taken into account. Related Work. Our work relates with the research line of Combinatorial Online Learning. There exists a long line of research studying low-regret online learning algorithms for various combinatorial domains such that online routing [28, 7] , selection of permutations [46, 50, 20, 2, 29] , selection of binary search trees [47] , submodular optimization [25, 32, 44] , matrix completion [26] , contextual bandits [1, 17] and many more. Finally, in combinatorial games agents need to learn to play optimally against each other over complex domains [30, 15] . As in the case of Dynamic k-Clustering in all the above online learning problems, MWU is not an option, due to the exponential number of possible actions. Another research direction of Combinatorial Online Learning studies black-box reductions converting polynomial time offline algorithm (full information on the data) into polynomial time online learning algorithms. [34] showed that any (offline) algorithm solving optimally and in polynomial time the objective function, that the Follow the Leader framework suggests, can be converted into a no-regret online learning algorithm. [33] extended the previous result for specific class of online learning problems called linear optimization problems for which they showed that any α-approximation (offline) can be converted into an α-regret online learning algorithm. They also provide a surprising counterexample showing that such black-box reductions do not hold for general combinatorial online learning problems. Both the time efficiency and the regret bounds of the reductions of [34] and [33] were subsequently improved by [42, 45, 35, 8, 16, 27, 21, 22, 24] . We remark that the above results do not apply in our setting since Dynamic k-Clustering can neither be optimally solved in polynomial-time nor is a linear optimization problem. Our works also relates with the more recent line of research studying clustering problems with time-evolving clients. [18] and [4] respectively provide Θ (log(nT )) and O(1)approximation algorithm for a generalization of the facility location problem in which clients change their positions over time. The first difference of Dynamic k-Clustering with this setting is that in the former case there is no constraint on the number of centers that can open and furthermore, crucially perfect knowledge of the positions of the clients is presumed. More closely related to our work are [14, 19] , where the special case of Dynamic k-Clustering on a line is studied (the clients move on a line over time). Despite the fact that both works study online algorithms, which do not require knowledge on the clients' future positions, they only provided positive results for k = 1 and 2. In this section we introduce notation and several key notions as long as present the formal Statements of our results. We denote by D the diameter of the metric space, D = max i∈V,j∈V d ij . We denote with n the cardinality of the metric space (|V | = n) and with r the maximum number of clients appearing in a single round, r = max 1≤t≤T |R t |. Finally we denote with ∆ k n the n-dimensional simplex, ∆ k n = {y ∈ R n : i∈V y i = k and y i ≥ 0}. Following the standard notion of regret in online learning [23] , we provide the formal definition of an α-regret online learning algorithm for Dynamic k-Clustering. where F 1 , . . . , F T are the positions of the centers produced by the algorithm for the sequence R 1 , . . . , R T and β < 1. Next, we introduce the Minimum-p-Union problem, the inapproximability results of which allow us to establish that constant regret cannot be achieved in polynomial time for Dynamic k-Clustering. Problem 1 (Min−p−Union). Given a universe of elements E and a collection of sets As already mentioned, the existence of an O(1)-approximation algorithm for Min−p−Union violates several widely believed conjectures in computational complexity theory [12, 5, 13] . In Theorem 1 we establish the fact that the exact same conjectures are violated in case there exists an online learning algorithm for Dynamic k-Clustering that runs in polynomial-time and achieves O(1)-regret. In Section 4, we present a polynomial-time deterministic online learning algorithm achieving Θ(k)-regret. Theorem 2. There exists a 6k-regret deterministic online learning algorithm for Dynamic k-Clustering that runs in polynomial time (Algorithm 4). More precisely, where F 1 , . . . , F T are the positions in which Algorithm 4 places the centers for the sequence of clients' positions R 1 , . . . , R T . In Theorem 3 we prove that the Θ(k) bound on the regret of Algorithm 4 cannot be significantly ameliorated with deterministic online learning algorithm even if the algorithm uses exponential time and space. Theorem 3. For any deterministic online learning algorithm for Dynamic k-Clustering problem, there exists a sequence of clients R 1 , . . . , R T such as the regret is at least k + 1. In Section 5 we present a randomized online learning algorithm the regret of which depends on the parameter r. where F t is the random variable denoting the k positions at which Algorithm 5 places the centers at round t. By combining Algorithm 4 and Algorithm 5 we can achieve Θ (min(k, r))-regret in polynomial time. Theorem 5. There exists an online learning algorithm for Dynamic k-Clustering that runs in polynomial-time and achieves min (6k, 4r)-regret. Remark 2. In case the value r = min 1≤t≤T |R t | is initially known to the learner, then Theorem 5 follows directly by Theorem 2 and 4. However even if r is not initially known, the learner can run a Multiplicative Weight Update Algorithm that at each round follows either Algorithm 4 or Algorithm 5 with some probability distribution depending on the cost of each algorithm so far. By standard results for MWU [23] , this meta-algorithm admits time-average cost less than the best of Algorithm 4 and 5. In this section we present the Fractional Dynamic k-Clustering problem for which we provide a polynomial-time no-regret online learning algorithm. This online learning algorithm serves as a primitive for both Algorithm 4 and Algorithm 5 of the subsequent sections concerning the original Dynamic k-Clustering. The basic difference between Dynamic k-Clustering and Fractional Dynamic k-Clustering is that in the second case the learner can fractionally place a center at some point of the metric space V . Such a fractional opening is described by a vector y ∈ ∆ k n . Online Learning Problem 2. [Fractional Dynamic k-Clustering]At each round t ≥ 1, 1. The learner selects a vector y t ∈ ∆ k n . The value y t i stands for the fractional amount of center that the learner opens in position i ∈ V . 2. The adversary selects the positions of the clients denoted by R t ⊆ V (after the selection of the vector y t ). 3. The learner incurs fractional connection cost FC Rt (y t ) described in Definition 2. Definition 2 (Fractional Connection Cost). Given the positions of the clients R ⊆ V , we define the fractional connection cost FC R (·) of a vector y ∈ ∆ k n as the optimal value of the following convex program. It is not hard to see that once the convex program of Definition 2 is formulated with respect to an integral vector y ∈ ∆ k n (y i is either 0 or 1) the fractional connection cost FC R (y) equals the original connection cost C R (y). As a result, the cost of the optimal solution y * ∈ ∆ n k of Fractional Dynamic k-Clustering is upper bounded by the cost of the optimal positioning of the centers F * in the original Dynamic k-Clustering. Lemma 1. For any sequence of clients' positions R 1 , . . . , R T , the cost of the optimal fractional solution y * for Fractional Dynamic k-Clustering is smaller than the cost of the optimal positioning F * for Dynamic k-Clustering, Lemma 1 will be used in the next sections where the online learning algorithms for the original Dynamic k-Clustering are presented. To this end, we dedicate the rest of this section to design a polynomial time no-regret algorithm for Fractional Dynamic k-Clustering. A key step towards this direction is the use of the subgradient vectors of FC Rt (·). Given a function f : R n → R, a vector g ∈ R n belongs in the subgradient of f at point x ∈ R n ,g ∈ ∂f (x), if and only if f (y) ≥ f (x) + g (y − x) , for all y ∈ R n . Computing the subgradient vectors of functions, as complicated as FC Rt (·), is in general a computationally hard task. One of our main technical contributions consists in showing that the latter can be done through the solution of an adequate convex program corresponding to the dual of the convex program of Definition 2. Lemma 2. Consider the convex program of Definition 2 formulated with respect to a vector y ∈ ∆ k n and the clients' positions R. Then the following convex program is its dual. where || · || * p is the dual norm of || · || p In the following lemma we establish the fact that a subgradient vector of ∂FC Rt (·) can be computed through the optimal solution of the convex program in Lemma 2. ij denote the value of the variables k ij in the optimal solution of the convex program in Lemma 2 formulated with respect to vector y ∈ ∆ k n and the clients' positions R. Then for any vector y ∈ ∆ k n , Moreover there exits an Θ(r · |V |) algorithm for solving the dual program (Algorithm 1) and additionally |k * ij | ≤ D. Algorithm 1 A time-efficient algorithm for solving the dual program of Lemma 2 1: Input: A vector y ∈ ∆ k n and a set of clients R ⊆ V . 2: Output: An optimal solution for the convex program of Lemma 2. 3: for each client j ∈ R, do 4: Sort the nodes i ∈ V in increasing order according to d ij . Rem ← 1 6: for each each i ∈ V do 7: x ij ← min(y i , Rem). Rem ← Rem − x ij . end for 10: end for 11: for each client j ∈ R do 12: 13: Remark 3. Algorithm 1 is not only a computationally efficient way to solve the convex program of Lemma 2, but most importantly guarantees that the value k * ij are bounded by D (this is formally Stated and proven in Lemma 2). The latter property is crucial for developing the no-regret algorithm for Fractional Dynamic k-Clustering. Up next we present the no-regret algorithm for Fractional Dynamic k-Clustering. Algorithm 2 A no-regret algorithm for Fractional Dynamic k-Clustering 1: Initially, the learner selects y 1 i = k/n for all i ∈ V . 2: for rounds t = 1 · · · T do 3: The learner selects y t ∈ ∆ k n . The adversary selects the positions of the clients R t ⊆ V . The learner receives cost, FC Rt (y t ). The learner runs Algorithm 1 with input y t and R t and sets g t i = − j∈Rt k t ij 7: for each i ∈ V do 8: end for 10: end for We conclude the section with Theorem 6 that establishes the no-regret property of Algorithm 2 and the proof of which is deferred to the Appendix C. Theorem 6. Let y 1 , . . . , y T be the sequence of vectors in ∆ k n produced by Algorithm 2 for the clients' positions R 1 , . . . , R T . Then, In this section we show how one can use Algorithm 2 described in Section 3 to derive Θ(k)-regret for the Dynamic k-Clustering in polynomial-time. The basic idea is to use a rounding scheme that given a vector y ∈ ∆ k n produces a placement of the k centers F y ⊆ V (with |F y | ≤ k) such that for any set of clients' positions R, the connection cost C R (F y ) is approximately bounded by the factional connection cost FC R (y). This rounding scheme is described in Algorithm 3. • For any set of clients R, • The cardinality of F y is at most k, |F y | ≤ k. Up next we show how the deterministic rounding scheme described in Algorithm 3 can be combined with Algorithm 2 to produce an Θ(k)-regret deterministic online learning algorithm that runs in polynomial-time. The overall online learning algorithm is described in Algorithm 4 and its regret bound is formally Stated and proven in Theorem 2. 1: for rounds t = 1 · · · T do 2: The learner computes the vector y t ∈ ∆ k n by running Algorithm 2 for the sequence of clients' positions (R 1 , . . . , R t−1 ). The learner places centers to the positions F yt produced by Algorithm 3 given input y t . The adversary selects the clients' positions R t ⊆ V . The learner suffers connection cost C Rt (F yt ) 6: end for We conclude the section with the proof of Theorem 2 in which the regret bounds of Algorithm 4 are established. Proof of Theorem 2. The second case of Lemma 4 ensures that |F t | ≤ k and thus Algorithm 4 opens at most k facilities at each round. Applying the first case of Lemma 4 for R = R t we get that C Rt (F t ) ≤ 6k · FC Rt (y t ). As a result, where the last inequality follows by Theorem 6. However Lemma 1 ensures that In this section we present a Θ(r)-regret randomized online learning algorithm. This algorithm is described in Algorithm 5 and is based on the randomized rounding developed by Charikar and Li for the k-median problem [11] . ). There exists a polynomial-time randomized rounding scheme that given a vector y ∈ ∆ k n produces a probability distribution, denoted as CL(y), over the subsets of V such that, 1. with probability 1 exactly k facilities are opened, P F ∼CL(y) [|F | = k] = 1. 2. for any position j ∈ V , Similarly with the previous section, combining the randomized rounding of Charikar-Li with Algorithm 1 produces a Θ(r)-regret randomized online learning algorithm that runs in polynomial-time. Algorithm 5 A Θ(r)-regret randomized online learning algorithm 1: for rounds t = 1 · · · T do 2: The learner computes the vector y t ∈ ∆ k n by running Algorithm 2 for the sequence of clients' positions (R 1 , . . . , R t−1 ). The learner places centers to the positions F t ⊆ V produced by the Charikar-Li randomized rounding with input y t , F t ∼ CL(y t ). The adversary selects a request R t ⊆ V . The learner suffers connection cost C Rt (F t ) 6: end for The proof of Theorem 4 that establishes the regret bound of Algorithm 5 follows by Lemma 5 and Theorem 6 and is deferred to the Appendix E. In this section we evaluate the performance of our online learning algorithm against adversaries that select the positions of the clients according to time-evolving probability distributions. We remark that the regret bounds established in Theorem 2 and Theorem 4 hold even if the adversary maliciously selects the positions of the clients at each round so as to maximize the connection cost. As a result, in case clients arrive according to some (unknown and possibly time-varying) probability distribution that does not depend on the algorithm's actions, we expect the regret of to be way smaller. In this section we empirically evaluate the regret of Algorithm 4 for Dynamic k-Clustering in case p = ∞. We assume that at each round t, 20 clients arrive according to several static or time-varying two-dimensional probability distributions with support on the [−1, 1] × [−1, 1] square and the possible positions for the centers being the discretized grid with = 0.1. In order to monitor the quality of the solutions produced by Algorithm 4, we compare the time-average connection cost of Algorithm 4 with the time-average fractional connection cost of Algorithm 2. Theorem 6 ensures that for T = Θ(k 2 D 2 / 2 ) the time-average fractional connection cost of Algorithm 2 is at most greater than the time-average connection cost of the optimal static solution for Dynamic k-Clustering. In the following simulations we select = 0.1 and track the ratio between the time-average cost of Algorithm 4 and of Algorithm 2 which acts as an upper bound on the regret. Uniform Square In this case the 20 clients arrive uniformly at random in the [−1, 1] × [−1, 1] square. Figure 2 illustrates the solutions at which Algorithm 4 converges for k = 2, 3 and 8 as long as the achieved regret. Uniform Distribution with Time-Evolving Centers In this case the 20 clients arrive according to the uniform distribution with radius 0.3 and a time-varying center that periodically follows the trajectory described in Example 1. Figure 1 depicts the centers at which Algorithm 4 converges after 100k 2 rounds which are clearly close to the optimal ones. This work studies polynomial-time low-regret online learning algorithms for Dynamic k-Clustering, an online learning problem capturing clustering settings with time-evolving clients for which no information on their locations over time is available. We show that, under some well-established conjectures, O(1)-regret cannot be achieved in polynomial time and we provide a Θ(min(k, r))-regret polynomial time algorithm with r being the maximum number of clients appearing in a single round. At a technical level, we present a two-step approach where in the first step we provide a no-regret algorithm for the Fractional Dynamic k-Clustering while in the second step we provide online rounding scheme converting the sequence of fractional solutions, produced by the no-regret algorithm, into solutions of Dynamic k-Clustering. Applying the same approach to other combinatorial online learning problems is an interesting research direction. The expected cost of the above algorithm, denoted by E[ALG], is Let the metric space be composed by k + 1 points with the distance between any pair of (different) points being 1. At each round t, there exists a position at which the learner has not placed a facility (there are k + 1 positions and k facilities). If the adversary places one client at the empty position of the metric space, then the deterministic online learning algorithm admits overall connection cost equal to T . However the optimal static solution that leaves empty the position with the least requests pays at most T /(k + 1). The Langragian of the convex program of Definition 2 is, Rearranging the terms we get, In order for the function g(A, k, λ) = min β,y,x,M + ,M − L(β, y, x, A, k, λ) to get a finite value the following constraints must be satisfied, Using the fact that the Lagragian multipliers µ ij ≥ 0, we get the constraints of the convex program of Lemma 2. The objective comes from the fact that once g(A, k, λ) admits a finite value then g(A, k, λ) = j∈R A j − i∈V j∈R k ij · y i . Let λ * j , A * j , k * ij denote the values of the respective variables in the optimal solution of the convex program of Lemma 2 formulated with respect to the vector y = (y 1 , . . . , y n ). Respectively consider λ j , A j , k ij denote the values of the respective variables in the optimal solutions of the convex program of Lemma 2 formulated with respect to the vector y = (y 1 , . . . , y n ). Equations 5 and 6 follow by strong duality, more precisely FC R (y) = j∈R A * j − i∈V j∈R k * ij · y i since the convex program of Lemma 2 is the dual of the convex program the solution of which defines FC R (y ) (respectively for FC R (y ) = j∈R A j − i∈V j∈R k ij · y i ). Equation 4 is implied by the fact that the solution (λ , k , A ) is optimal when the objective function is j∈R A j − i∈V j∈R k ij · y i . Notice that the constraints of the convex program in Lemma 2 do not depend on the y-values. As a result, the solution (λ * , k * , A * ) (that is optimal for the dual convex program formulated for y) is feasible for the dual program formulated for the values y . Thus Equation 4 follows by the optimality of (λ , k , A ). Up next we prove the correctness of Algorithm 1. Notice that the the solution β, x that Algorithm 1 constructs is feasible for the primal convex program of Definition 2. We will prove that the dual solution that Algorithm 1 constructs is feasible for the dual of Lemma 2 while the exact same value is obtained. • ||λ|| * p = 1: It directly follows by the fact that λ j = • d ij · λ j + k ij ≥ A j : In case d ij < D * j , Algorithm 1 implies that x ij = y i and the inequality directly follows. In case d ij ≤ D * j the inequality holds trivially since k ij = 0. Now consider the objective function, = j∈R λ j · β j = j∈R β p j 1/p where Equation 8 follows by the fact that x ij = 0 for all j / ∈ V + j and thus j∈V + j x ij = 1. Finally notice that |λ j | ≤ 1 and thus k ij ≤ D where D is the diameter of the metric space. By Lemma 3, |g t i | = | − j∈R t k t * ij | ≤ Dr since |R t | ≤ r. Applying Theorem 1.5 of [23] we get that The following claim trivially follows by Step 10 of Algorithm 4. Claim 1. For any node j ∈ V , d(j, F y ) ≤ 6k · β * j . We are now ready to prove the first item of We proceed with the second item of Lemma 4. For a given node j ∈ S, let B j = {i ∈ V : d ij ≤ 3k · β * j }. It is not hard to see that for any j ∈ F y , i∈B j y i ≥ 1 − 1 3k Observe that in case the latter is not true then i / ∈B j x * ij ≥ 1 3k , which would imply that β * j > β * j . The second important step of the proof is that for any j, j ∈ F y , Observe that in case there was m ∈ B j ∩B j would imply d(j, m) ≤ 3k·β * j and d(j , m) ≤ 3k·β * j . By the triangle inequality we get d(j, j ) ≤ 6k · β * j (without loss of generality β * j ≤ β * j ). The latter contradicts with the fact that both j and j belong in set F y . Now assume that |F y | ≥ k + 1. Then i∈Fy y i ≥ |F y | · (1 − 1 3k ) ≥ (k + 1) · (1 − 1 3k ) > k. But the latter contradicts with the fact that i∈V y i = k. As a result, |F y | ≤ k. Proof of Theorem 4. To simplify notation the quantity E F ∼CL(yt) [C Rt (F )] is denoted as E[C Rt (F t )]. At first notice that by the first case of Lemma 5, Algorithm 5 ensures that exactly k facilities are opened at each round t. Concerning its overall expected connection cost we get, where the fist inequality is due to the fact that j∈Rt d(j, F ) p ≤ j∈Rt d(j, F ) p and the second is derived by applying the second case of Lemma 5. We overall get, Taming the monster: A fast and simple algorithm for contextual bandits Improved bounds for online learning over the permutahedron and other ranking polytopes A Bicriteria Approximation Algorithm for the k-Center and k-Median Problems Dynamic facility location via exponential clocks Pseudorandom generators with long stretch and low locality from random local one-way functions Local search heuristic for k-median and facility location problems Online linear optimization and adaptive routing Approximation algorithms and online mechanisms for item pricing Improved combinatorial algorithms for the facility location and k-median problems A constant-factor approximation algorithm for the k-median problem (extended abstract) A dependent lp-rounding approach for the k-median problem The densest k-subhypergraph problem Minimizing the union: Tight approximations for small set bipartite vertex expansion Facility reallocation on the line Price of competition and dueling games Oracle-efficient online learning and auction design Efficient optimal learning for contextual bandits Facility location in evolving metrics Reallocating multiple facilities on the line Efficient online learning of optimal rankings: Dimensionality reduction via gradient descent Combinatorial online prediction via metarounding Efficient online linear optimization with approximation algorithms Introduction to online convex optimization. Found Online improper learning with an approximation oracle Online submodular minimization Near-optimal algorithms for online matrix prediction The computational power of optimization in online learning Predicting nearly as well as the best pruning of a decision tree Learning permutations with exponential weights Ankur Moitra, Andrew Postlewaite, and Moshe Tennenholtz. Dueling algorithms Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and lagrangian relaxation Online submodular minimization for combinatorial structures Playing games with approximation algorithms Efficient algorithms for online decision problems Hedging structured concepts Constant factor approximation algorithm for the knapsack median problem Linear-time approximation schemes for clustering problems in any dimensions Approximating k-median via pseudo-approximation Approximation algorithms for geometric median problems The structure and function of complex networks Epidemic Spreading in Scale-Free Networks Online dynamic programming Wouter Van den Broeck, Corinne Régis, Bruno Lina, and Philippe Vanhems. High-resolution measurements of face-to-face contact patterns in a primary school An online algorithm for maximizing submodular functions Online prediction under submodular constraints Predicting nearly as well as the best pruning of a planar decision graph Path kernels and multiplicative updates A framework for community identification in dynamic social networks The Design of Approximation Algorithms Online linear optimization over permutations K-medians, facility location, and the chernoff-wald bound A Proof of Theorem 1 Problem 2 (k−CenterUniform). Given a uniform metric space d : V ×V → R ≥0 (d(u, v) = 1 in case (u = v)) and a set of requests R 1 , . . . , R m ⊆ V . Select F ⊆ V such as |F | = k and m s=1 C Rs (F ) is minimized where p is ∞. Lemma 6. Any c-approximation algorithm for k − CenterUniform implies a c-approximation algorithm for Min − p − Union.Proof. Given the collection U = {S 1 , . . . , S m } of the Min − p − Union, we construct a uniform metric space V of size m, where each node of V corresponds to a set S i .For each elements e ∈ E of the Min − p − Union we construct a request R e ⊆ V for the k − CenterUniform that is composed by the nodes corresponding to the sets S i that containt e. Observe that due to the uniform metric and the fact that p = ∞, for anyAny polynomial time c-regret algorithm for the online k-Center implies a (c + 1)approximation algorithm (offline) for the k − CenterUniform.Proof. Let assume that that there exists a polynomial-time online learning algorithm such that for any request sequence R 1 , . . . , R T ,for some α < 1. Now let the requests lie on the uniform metric, p = ∞ and that the adversary at each round t selects uniformly at random one of the requests R 1 , . . . , R m that are given by the instance of k − CenterUniform. In this case the above equation takes the following form,where OPT * is the optimal solution for the instance of k − CenterUniform and F t is the random set that the online algorithm selects at round t. Now consider the following randomized algorithm for the k − CenterUniform.1. Select uniformly at random a t from {1, . . . , T }.2. Select a set F ⊆ V according to the probability distribution F t .