key: cord-0528546-gxtob5d9
authors: Ren, Wenbo; Liu, Jia; Shroff, Ness B.
title: The Sample Complexity of Best-$k$ Items Selection from Pairwise Comparisons
date: 2020-07-06
journal: nan
DOI: nan
sha: 2eadd26889d869a31b494907ea89d15c1d08ac38
doc_id: 528546
cord_uid: gxtob5d9

This paper studies the sample complexity (aka number of comparisons) bounds for the active best-$k$ items selection from pairwise comparisons. From a given set of items, the learner can make pairwise comparisons on every pair of items, and each comparison returns an independent noisy result about the preferred item. At any time, the learner can adaptively choose a pair of items to compare according to past observations (i.e., active learning). The learner's goal is to find the (approximately) best-$k$ items with a given confidence, while trying to use as few comparisons as possible. In this paper, we study two problems: (i) finding the probably approximately correct (PAC) best-$k$ items and (ii) finding the exact best-$k$ items, both under strong stochastic transitivity and stochastic triangle inequality. For PAC best-$k$ items selection, we first show a lower bound and then propose an algorithm whose sample complexity upper bound matches the lower bound up to a constant factor. For the exact best-$k$ items selection, we first prove a worst-instance lower bound. We then propose two algorithms based on our PAC best items selection algorithms: one works for $k=1$ and is sample complexity optimal up to a loglog factor, and the other works for all values of $k$ and is sample complexity optimal up to a log factor.

Ranking from pairwise comparisons (or pairwise ranking) is a fundamental problem that has been widely applied to various areas, such as recommender systems, searching, crowd-sourcing, and social choices. In a pairwise ranking system, the learner wants to learn the full or partial ranking (e.g., best-k items) of a set of items from noisy pairwise comparisons, where items can refer to various things such as products, posts, choices, and pages; and comparisons refer to processes or queries that indicate qualities or users' preferences over the items. In this paper, for simplicity, we use the terms "item", "comparison", and "users' preference".

A noisy pairwise comparison is a query over two items that returns a noisy result about the preferred one. Here, "noisy" simply means that the comparison could return the less preferred one, which may be the result of the uncertain nature of physics, machines, or humans. Since the comparisons can reveal some information about the users' preferences, by repeatedly comparing these items, the learner may find a reasonable global ranking (e.g., (Hunter, 2004) ) or local ranking (e.g., (Park et al., 2015) ) of these items.

Based on when the comparisons are generated, the ranking problems can be divided into two classes: passive ranking (e.g., (Park et al., 2015; Shah et al., 2017) ) and active ranking (e.g., (Pfeiffer et al., 2012; Chen et al., 2013; Falahatgar et al., 2017a; Ren et al., 2019) ). In passive ranking, the learner first has all the comparison data and then develops a reasonable ranking. In active ranking, the learner does not have all the comparison data at the beginning, and can adaptively choose items to compare during the learning process. This paper studies the fully active ranking (or active learning), where for each comparison, the learner can adaptively choose two items to compare according to past observations. Chen et al. (2013) showed that in a crowdsourcing dataset, their active ranking algorithm uses only 3% comparisons and achieves almost the same performance as passive ranking. This paper focuses on the best-k-items-selection problem. For many applications, ranking all the items may be neither efficient nor necessary. For instance, in a video sharing website, filters may generate hundreds of candidate videos, but the website may only want to present 30 videos to the user. Thus, it is not necessary to rank all these videos, and a more efficient way can be to first select the best 30 videos and then rank them. The best-k items selection can be of interest to many different applications.

In previous works (e.g., (Yue and Joachims, 2011; Busa-Fekete et al., 2014; Szörényi et al., 2015; Falahatgar et al., 2017b; 2018; Saha and Gopalan, 2019a; b) ), the problem of best item selection has been studied in different settings. However, the problem of best-k items selection has been less investigated. We note that best-k items selection is not a naive extension to the best item selection. For instance, in the deterministic case, finding the max number is easy by sequentially doing n − 1 comparisons and eliminating the smaller ones, while finding the largest k numbers in O(n log k) time needs more complex algorithms (e.g., quick select (Hoare, 1961) ). The same is true in non-deterministic settings. We do not find a method to extend best item selection algorithms to an efficient best-k one.

This paper studies both the exact and probably approximately correct (PAC) best-k items selection. Exact selection simply means finding the exact best-k items. PAC selection is to find k items that are approximately best or good enough (see Section 1.2 for details), which can avoid the cases where the preferences over two items are extremely close, making exactly ranking them too costly.

In summary, this paper studies the problem of using fully active ranking (active learning) to find the exact or PAC best-k items from noisy pairwise comparisons with a certain confidence and use as few comparisons as possible.

Assume that there are n items, indexed by 1, 2, 3, ..., n, and we use [n] = {1, 2, 3, ..., n} 1 to denote the set of these items. For these items, we make the following assumptions: A1) Time-invariance. For any items i and j in [n], we assume that the distributions of the comparison outcomes over items i and j are time-invariant, i.e., there is a number p i,j in [0, 1] independent of time such that for any comparison over items i and j, item i wins the comparison with probability p i,j , where "item i wins the comparison" means that the comparison returns item i as the preferred one. A2) Tie Breaking. We assume that for every comparison, exactly one item wins. If a tie does happen, we randomly assign one item as the winner. Thus, for any items i and j in [n], p i,j + p j,i = 1. A3) Independence. We assume that the comparison results are independent across time, items, and sets.

We note that assumptions A1) to A3) are common in the literature (e.g., (Szörényi et al., 2015; Shah and Wainwright, 2017; Falahatgar et al., 2017a; b; 2018; Heckel et al., 2018; Katariya et al., 2018; Heckel et al., 2019; Saha and Gopalan, 2019a; b; Ren et al., 2019) ). In this paper, we make two more assumptions to restrict our problems to specific conditions. 1 For any positive integer m, we define [m] := {1, 2, 3, ..., m}.

Before making these two assumptions, we introduce some notations. For two items i and j in [n], we define ∆ i,j := |p i,j − 1/2| as the gap of p i,j and 1/2, which can measure how difficult to order items i and j by comparing them. Also, we define p i,i := 1/2 for all items i. For real numbers a, b, we define a ∨ b := max{a, b}, and a ∧ b := min{a, b}.

A4) Strong stochastic transitivity (SST) (Shah et al., 2017; Falahatgar et al., 2018) . In this paper, the items are said to satisfy SST if and only if (i) there is a strict order over these n items, (ii) if i j 2 , then p i,j > 1/2, 3 and (iii) for any three items i, j, and l with i j l, p i,l ≥ p i,j ∨ p j,l . A5) Stochastic triangle inequality (STI) (Falahatgar et al., 2018) . The items are said to satisfy STI if for any three items i, j, and l, ∆ i,l ≤ ∆ i,j + ∆ j,l .

We note that many widely used parametric models such as the Bradley-Terry-Luce (Bradley and Terry, 1952; Luce, 2012) (BTL) and Thurstone's model (Thurstone, 1927) satisfy SST and STI, and thus, the algorithms in this paper can be directly used under these models. In this paper, we do not restrict our results to specific parametric models. Without loss of generality, we use r 1 r 2 · · · r n to denote the unknown true ranking.

The first problem is the PAC best-k items selection. We follow the definition of PAC best item of Falahatgar et al. (2017a; b; 2018) to define the PAC best-k items. We note that when k = 1, our definition of PAC best items is the same as that of Falahatgar et al. (2017a; b; 2018) .

Definition 1 (( , k)-optimal subsets). For a set S, given k ≤ |S|, and ∈ [0, 1], a set U ⊂ S is said to be an ( , k)optimal subset of S if |U | = k and p i,j ≥ 1/2 − for any items i in U and j not in U .

If < min i∈[n]:r k i ∆ i,r k , an ( , k)-optimal subset of S is exactly the set of the best-k items of S. However, if we do not have a priori knowledge about the gaps, we cannot use the PAC algorithms to find the exact best items. The number is called the error tolerance. We note that in an ( , k)-optimal subset, every item i has p i,r k ≥ 1/2 − .

Problem 1 (PAC best-k items selection (PAC k-selection)). Given n items [n], k ≤ n/2, and δ, ∈ (0, 1/2), we want to find an ( , k)-optimal subset of S with probability at least 1 − δ, and use as few comparisons as possible.

The second problem is the exact best-k items selection. Under SST, since there is a strict order over these n items, the best-k items are unique. The best-k items are r 1 , r 2 , ..., r k , and finding the best-k items is to find the set {r 1 , r 2 , ..., r k }. We do not need to order these best-k items but only need to find a k-sized set that contains all the best-k items.

Problem 2 (Exact best-k items selection (exact k-selection)). Given n items, k ≤ n/2, and δ ∈ (0, 1/2), we want to find the best-k items with probability at least 1 − δ, and use as few comparisons as possible.

We define the gap of item i as

and our sample complexity (aka number of comparisons) bounds for the exact k-selection depends on these gaps.

For the PAC k-selection problem, we first prove an Ω(n −2 log(k/δ)) lower bound on the expected number of comparisons, and then propose an algorithm with sample complexity O(n −2 log(k/δ)), which implies that our upper bound matches the lower bound up to a constant factor.

For the exact k-selection problem, we first prove a worst-instance sample complexity lower bound

We then propose an al-

based on our PAC k-selection algorithm, which is optimal up to a loglog factor. Finally, we propose another algorithm for general values of k with sample complexity O( i∈[n] ∆ −2 i (log(n/δ) + log log ∆ −1 i )), which is optimal up to a log factor.

An early work that has studied the exact k-selection was done by Feige et al. (1994) . Feige et al. (1994) have shown that if ∆ i,j ≥ ∆ > 0 for all items i and j where ∆ > 0 is a priori known, then to find the best-k items of [n] with probability at least 1 − δ, Θ(∆ −2 log(k/δ)) comparisons are sufficient and necessary for worst instances. However, the work of Feige et al. (1994) requires a priori knowledge of a lower bound of the values of ∆ i,j 's to run, which may not be possible in practice. This paper does not assume this knowledge. Further, the sample complexity in Feige et al. (1994) depends on the minimal gaps, i.e., min i =j ∆ i,j , while the sample complexity in this paper depends on ∆ ri,r k or ∆ ri,r k+1 , which exploits unequal gaps better. (Plackett, 1975; Luce, 2012 ) (PL) model 4 , which is a parametric model that satisfies SST and STI.

They proposed algorithms with adaptivity 5 one, which can find the best-k items of [n] with high probability 6 by O(n∆ −2 r k ,r k+1 log n) comparisons. In contrast, this paper focuses on fully active algorithms (i.e., the number of adaptivity is unlimited) and the algorithms are not restricted to parametric models. Another work that has focused on the exact k-selection problem under the MNL model is Chen et al. (2018) . Chen et al. (2018) proposed an exact k-selection algorithm from pairwise comparisons with sample complexity O(n log 14 (n)). They also studied ranking from multi-wise comparisons, which is beyond the scope of this paper. Busa-Fekete et al. (2014) studied the best item selection problem under Mallows model, and proposed an algorithm with samples complexity O(n log(n/δ)). Saha and Gopalan (2019b) studied the exact best item selection problem under the PL model with subset-wise feedbacks, and proposed an algorithm with O( i∈[n] [∆ −2 i (log δ −1 +log log ∆ −1 i )]) sample complexity for confidence 1 − δ, which is of the same order as the algorithm in this paper. Compared to the work of Saha and Gopalan (2019b), our algorithms work for all instances satisfying SST and STI, while the PL model is a special case in our setting.

Another focus of this paper is the PAC k-selection problem. To the best of our knowledge, we are the first to propose PAC k-selection algorithms. Prior to this paper, there are works that focused on the PAC best item selection problem. Falahatgar et al. (2017a; b) proved that under SST, to find an item i from [n] with p i,r1 ≥ 1/2 − with probability at least 1 − δ, Θ(n −2 log δ −1 ) comparisons are sufficient and necessary. Earlier to this, Yue and Joachims (2011) proved the same result for cases under the SST and the STI. The works of Saha and Gopalan (2019a) also proved the same sample complexity bounds under the PL model. When k = 1, our upper bound and lower bound for the PAC k-selection problem is the same as that of Falahatgar et al. There are also many works that studied the ranking problems under other models, which are beyond the scope of this paper. Shah and Wainwright (2017) can be viewed as a superset of SST and STI in some sense. However, we note that, for instances satisfying SST and STI, BS ranking algorithms may not be as efficient as their performance on BS problems 7 . Agarwal et al. (2017) ; Braverman 5 See Agarwal et al. (2017); Braverman et al. (2019) for details about learning with limited adaptivity. 6 In this paper, "with high probability" means that with probability at least 1 − n −p , where p > 0 is a sufficiently large constant. 7 The BS of an item i is 1 n−1 j =i pi,j. When pi,j = 2/3 for all i j, the gap of the BSs between the best two items is Θ(n −1 ), and thus, the sample complexity to order them by BS algorithms (e.g., Active Ranking (Heckel et al., 2019) is Ω(n 2 ). et al. (2019) studied the problem of ranking (or finding) the best-k items with limited adaptivity. Feige et al. (1994); Szörényi et al. (2015); Falahatgar et al. (2017a; b; 2018) ; Ren et al. (2019) studied the (PAC) full ranking problems in various settings, which is less related to this paper.

This section studies the sample complexity lower bound and upper bound for PAC k-selection. We first prove that for the worst instances, to find an ( , k)-optimal subset of [n] needs Ω(n −2 log(k/δ)) number of comparisons in expectation. Then, we design an algorithm that solves all instances with at most O(n −2 log(k/δ)) number of comparisons in expectation, which shows that both our lower bound and upper bounds are tight (up to a constant factor).

We first analyze the lower bound for PAC k-selection, which is stated in Theorem 2. We prove this bound by reducing the pure exploration multi-armed bandit (PEMAB) problem (e.g., (Mannor and Tsitsiklis, 2004; Kalyanakrishnan et al., 2012) ) to the PAC k-selection problem under the MNL model and using the lower bounds for the PEMAB problem of Mannor and Tsitsiklis (2004); Kalyanakrishnan et al. (2012) to get the desired lower bound for PAC k-selection. We note that Ren et al. (2018) used a similar method and proved a similar lower bound. However, its definition of PAC k-selection is different from that in this paper. Thus, we need to independently find a lower bound in this paper. 8 Later in subsection 3.2, we show that the lower bound stated in Theorem 2 is tight up to a constant factor.

Theorem 2 (Lower bound for PAC k-selection). Given ∈ (0, 1/128), δ ∈ (0, e −4 /4), n ≥ 2, and 1 ≤ k ≤ n/2, there is an n-sized instance satisfying SST and STI such that to find an ( , k)-optimal subset of [n] with probability 1 − δ, any algorithm needs to conduct Ω(n −2 log(k/δ)) number of comparisons in expectation.

We develop an optimal algorithm in two steps.

Step one is to design a PAC k-selection algorithm with O(n −2 log(n/δ)) sample complexity.

Step two is to develop another algorithm with O(n −2 log(k/δ)) sample complexity through the above algorithm. We note that Falahatgar et al. (2018) proposed an algorithm for finding the PAC full ranking with high probability, and has sample complexity O(n −2 log n). In a PAC ranking, the top-k items form an ( , k)-optimal subset of [n], and thus, this PAC full ranking algorithm can be used as a PAC k-selection algorithm. However, the al-gorithm of Falahatgar et al. (2018) can only guarantee to return correct results with confidence 1 − 1/n, while in the construction of the k-selection algorithm with sample complexity O(n −2 log(k/δ)), we need the confidence to be larger than 1 − 1/n. Thus, this algorithm is not sufficient for us to obtain the O(n −2 log(k/δ)) sample complexity. In this paper, we propose a k-selection algorithm with sample complexity O(n −2 log(n/δ)) to achieve this purpose.

3.2.1. STEP ONE: EPSILON-QUICK-SELECT Our first PAC k-selection algorithm is similar to a classical deterministic k-selection algorithm, Quick Select (Hoare, 1961) . In each round, Quick Select randomly picks (some versions may have different picking strategies) an item as a pivot and splits the other items into two piles: one contains items no less than the pivot and the other contains items less than the pivot. After the splitting, according to the sizes of these two piles, we do Quick Select again on one pile. This will be repeated until we find the k-th best item. The expected time complexity of Quick Select is O(n).

When the comparisons are noisy, we need more effort to find the (PAC) best-k items, but the basic idea is similar to Quick Select. For each round t, we randomly pick an item v t as the pivot, and compare every other item with the pivot for certain times. According to these comparisons, we distribute each item i into one of the following three piles: (i) S up :={item i is "sure" to be better than v t , i.e., p i,vt > 1/2 with a large probability}; (ii) S mid :={item i is "close to" v t , i.e., 1/2 − ≤ p i,vt ≤ 1/2 + with a large probability}; and (iii) S down :={item i is "sure" to be worse than v t , i.e., p i,vt < 1/2 with a large probability}. After the splitting, there can be three cases. If S up contains at least k items, then we run our algorithm again on S up . If S up contains less than k items, and S up ∪ S mid contains at least k items, then the items in S up along with (k − |S up |) arbitrary items in S mid form an ( , k)-optimal subset. If S up ∪ S mid contains less than k items in total (say the number is k ), then we run the algorithm on S down to find the PAC best (k − k ) items, and the returned items along with S up and S mid form an ( , k)-optimal subset. The properties of SST and STI guarantee the correctness, and the choice of input confidence for each round guarantees the sample complexity.

The "Quick-Select-like" algorithm is described in Algorithm 2 Epsilon-Quick-Select (EQS). Subroutine 1 Distribute-Item (DI) is a subroutine, which splits the items into three piles. DI is called by EQS with two shifts s u and s d being equal to zero, and later in Section 4, the algorithms for exact k-selection will also call DI as a subroutine. Lemma 3 states the theoretical performance of DI, and Theorem 4 states the theoretical performance of EQS.

Lemma 3 (Theoretical Performance of DI). DI terminates after at most O( −2 log δ −1 ) comparisons, and with proba-Subroutine 1 Distribute-Item (DI) (i, v, , s u , s d , δ, S up , S mid , S down ) 1: Set t max := 2 2 log 4 δ , ∀t ∈ Z, b t := 1 2t log π 2 t 2 3δ ; 2: t ← 0, and w 0 ← 0; 3: repeat 4: t ← t + 1 and compare i and v once;

Add i to S up and return;

8: DI(i, v, 2 , 0, 0, δ 1 , S up , S mid , S down ). 5: end for 6: if |S up | > k then return S up ∪ S mid ∪ EQS(S down , k , , (n−1)δ n ); 13: end if bility at least 1 − δ, one the following five events happens: (i) p i,v ≥ 1/2 + + s u and item i is added to S up ; (ii) p i,v ∈ (1/2 + s u , 1/2 + + s u ) and item i is not added to S down ; (iii) p i,v ∈ [1/2 − s d , 1/2 + s u ] and item i in added to S mid ; (iv) p i,v ∈ (1/2 − − s d , 1/2 − s d ) and item i is not added to S up ; and (v) p i,v ≤ 1/2 − − s d and item i is added to S down .

Theorem 4 (Theoretical Performance of EQS). Given an input set S with |S| = n, 1 ≤ k ≤ n/2, and , δ ∈ (0, 1/2), EQS(S, k, , δ) terminates after O(n −2 log(n/δ)) number of comparisons in expectation, and with probability at least 1 − δ, returns an ( , k)-optimal subset of S.

In this section, we use EQS to develop a PAC k-selection algorithm with sample complexity O(n −2 log(k/δ)). The algorithm runs like a tournament and consists of rounds. At each round t, we split the remaining items (use R t to denote the set of the remaining items at the beginning of round t) into subsets with size around 2k, and for each subset we use EQS to find an ( t , k)-optimal subset with confidence 1 − δ t /k. We then keep the items in these ( t , k)optimal subsets, and remove all the other items. We can show that with probability at least 1 − δ t , the items kept in round t (i.e., R t+1 ) contain an ( t , k)-optimal subset of R t , which implies that for any t, R t+1 contains a subset U t+1 such that for any item i in U t+1 and item j in R t − U t+1 , p i,j ≥ 1/2 − t . We can also show that with probability at least 1 − δ t − δ t−1 , for any item i in U t+1 and j in R t−1 −U t+1 , p i,j ≥ 1/2− t − t−1 . Repeating this, we can show that with probability at least 1− t r=1 δ r , for any item i in U t+1 and item j in [n] − U t+1 , p i,j ≥ 1/2 − t r=1 r . Thus, by repeating the rounds until only k items remain, we have that with probability at least 1 − ∞ t=1 δ t , for any item i in the returned set and j not in the returned set, p i,j ≥ 1/2 − ∞ t=1 t , which implies that the returned set is a ( ∞ t=1 t , k)-optimal subset of [n]. Choosing ∞ t=1 t ≤ and ∞ t=1 δ t ≤ δ, we can get that with probability at least 1 − δ, the returned set is an ( , k)-optimal subset of [n]. The algorithm is described in Algorithm 3, and its theoretical performance is stated in Theorem 5.

Algorithm 3 Tournament-k-Selection([n], k, , δ) (TKS) 1: For any t ∈ Z + , set t := 1 4 ( 4 5 ) t and δ t := 6δ π 2 t 2 ; 2: Initialize t ← 0, R 1 ← [n]; 3: repeat 4:

Theorem 5 (Theoretical Performance of TKS). Given input 1 ≤ k ≤ n/2, and , δ ∈ (0, 1/2), TKS terminates after O(n −2 log(k/δ)) number of comparisons in expectation, and with probability at least 1 − δ, returns an ( , k)-optimal subset of [n].

Remark. i) The sample complexity upper bound of TKS matches the lower bound stated in Theorem 2 up to a constant factor. Thus, in order sense, our upper and lower bounds for PAC k-selection are tight. ii) When k = 1, our upper bound is the same as that of Falahatgar et al. (2017a; b) . We note that the algorithms given by Falahatgar et al. (2017a; b) only work for k = 1, and it is not obvious how to generalize them to cases with general k-values.

In this subsection, we prove a lower bound for the exact k-selection problem. We note that the sample complexity lower bound not only depends on the gaps between items i and items r k or r k+1 as in PEMAB problems (e.g., (Jamieson et al., 2014; Chen et al., 2017) ), but also depends on other comparisons probabilities. In fact, even if the values of ∆ i 's are the same, different instances may have different lower bounds on the sample complexity for finding the best-k items. For some instances, even the Ω(∆ −2 i ) lower bound for ordering two items stated in Theorem 7 and Ren et al. (2019) may not hold if there are more than two items. For instance, Example 13 in Ren et al. (2019) states an instance with three items such that O(∆ −1 r1,r2 log(∆ −1 r1,r2 δ −1 )) comparisons are sufficient to find the best item with probability 1 − δ, which indicates the difficulty in finding an instance-wise lower bound for all instances.

Thus, in this chapter, we prove a lower bound for a specific model: Thurstone's model. In Thurstone's model, each item i holds a real number θ i representing the users' preference for this item. We name these numbers as scores. The higher the score, the more preferred the item, and thus, the scores imply a true order of these items. Under Thurstone's model with variance σ 2 , for any two items i and j, we have

where Z 1 and Z 2 are two independent Gaussian(0, σ 2 ) random variables. The definitions of the gaps ∆ i,j 's and ∆ i 's remain the same as in Section 1.2. It can be verified that Thurstone's model satisfies SST and STI. Under Thurstone's model, we prove the following lower bound for exact kselection, which can be viewed as a worst-instance lower bound. Here, the worst-instance lower bound means that under the same values of gaps δ i 's, the lower bound for the Thurstone's model is no higher than the actual worstinstance lower bound. In the proof, we invoke the results shown by Jamieson et al. For stating our lower bound, we define a notationΩ(·) in Definition 6 which can be viewed as a slightly weaker version of Ω(·). This definition is inspired by Theorem D.1 in (Chen and Li, 2015). Definition 6 (DefiningΩ(·)). Define E i := [e i , e i+1 ) for any positive integer i. Two function f (x) and g(x) are said

We can see that the notation f (

for some constant c 0 > 0 except a negligible proportion of the points x. Theorem 7 (Lower bound for exact k-selection under Thurstone's model). Under Thurstone's model with variance one, given δ ∈ (0, 1/100), n items with scores θ 1 , θ 2 , ..., θ n ∈ [0, 1], and 1 ≤ k ≤ n/2, to find the best-k items with probability at least 1 − δ, any algorithm must conduct at least

We first use the PAC algorithm TKS to establish a best item selection algorithm called Sequential-Elimination-Exact-Best-Selection (SEEBS). SEEBS runs in rounds. In each round t, it chooses a threshold α t , uses TKS to choose a PAC best item v t with error tolerance α t /3, and uses DI to identify items i with p i,r1 ≤ 1/2 − α t and removes them. By choosing a proper confidence δ t for each round t, the properties of DI and TKS stated in Lemma 2 and Theorem 3 guarantee that with probability at least 1 − δ, the best item r 1 will not be removed. If α t is diminishing so that lim t→∞ α t = 0 and the confidences satisfy ∞ t=1 δ t ≤ δ, the algorithm will, with probability at least 1 − δ, discard all items other than r 1 and keep the best item r 1 . TKS is described in Algorithm 4, and its theoretical performance is stated in Theorem 8.

Algorithm 4 Sequential-Elimination-Exact-Best-Selection ([n], δ) (SEEBS) 1: For all t ∈ Z + , set α t := 2 −t and δ t := 6δ

10:

t ← t + 1; 11: until |R t | = 1 12: return the only item in R t ;

Theorem 8 (Theoretical Performance of SEEBS). With probability at least 1 − δ, SEEBS terminates after

number of comparisons in expectation and returns the best item in [n].

Remark. i) According to the lower bound stated in Theorem 7, SEEBS is worst-instance optimal up to a loglog factor. If ∆ i 's are not too small, the term log log ∆ −1 i will be dominated by log δ −1 , i.e., if ∆ −1 i ≤ e 1/δ , then our upper bound is worst-instance optimal up to a constant factor. ii) The phrase "in expectation" in Theorem 8 does not only come from the sample complexity of TKS, but also comes from the choice of input confidences of DI. At each round t, by inputting δ t /3 to DI, one cannot guarantee that the executions of DI correctly assign all non-best items i with p i,r1 ≤ 1/2 − α t to S down with probability 1 − δ t , and thus, more rounds may be needed to remove these non-best items. Therefore, in expectation, the number of comparisons over item i is upper bounded by

In this subsection, we develop an exact best-k items selection algorithm called Sequential-Elimination-Exact-k-Selection (SEEKS). The basic idea of SEEKS is similar to SEEBS. SEEKS runs in rounds. At each round t, it calls TKS and TKS2 (where TKS2 is almost the same as TKS except that it finds the PAC worst items) to find a pivot v t such that ∆ vt,r k ≤ α t /3. Then it uses DI to distribute the items such that with probability at least

, not added to S t+1 or R t+1 ); (iii) none of the items with p i,r k ≥ 1/2 is discarded; and (iv) all items added to S t+1 are of the best-k items. By choosing proper confidence δ t for each round t, we guarantee that with probability at least 1 − δ, none of the best-k items is discarded, and all items added to S t+1 are of the best-k items. Thus, with probability at least 1 − ∞ t=1 δ t = 1 − δ, in all rounds, none of the best items is discarded, and S t only contains the best-k items. When |S t | ≤ k or |S t ∪ R t | ≤ k, the algorithm terminates, and thus, if the algorithm returns, with probability at least 1 − δ, it returns the set of the best-k items. Since lim t→∞ α t = 0, there is a large enough t such that either all of the best-k items have been added to some S t , or all items except the best-k are discarded. Therefore, the algorithm terminates in finite time. The sample complexity follows from the choice of α t 's and δ t 's. SEEKS is described in Algorithm 5. Its theoretical performance is stated in Theorem 9.

Theorem 9 (Theoretical Performance of SEEKS). With probability at least 1 − δ, SEEKS terminates after

number of comparisons in expectation, and returns the best-k items.

Remark. i) According to the lower bound stated in Theorem 7, SEEKS is worst-instance optimal up to a log factor. We conjecture that the true lower bound and upper bound of the exact k-selection depend on log(k/δ), just as

end for 10:

13:

that of the PAC k-selection, but it remains an open problem for future studies. ii) Different from Theorem 4, the phrase "in expectation" in Theorem 9 comes from the sample complexity of TKS (stated in Theorem 5). If one can find a PAC k-selection algorithm that uses no more than O(n −2 log(n/δ)) comparisons with probability 1 − δ, then by replacing TKS and TKS2 with this algorithm, we can remove "in expectation" in Theorem 9.

In this section, we perform experiments on the synthetic dataset with equal noise-levels (i.e., ∆ i,j is a constant) and public election datasets provided by PrefLib (Mattei and Walsh, 2013) . In the supplementary material, we present the results of the synthetic dataset with unequal noise-levels and the numerical illustrations of the growth rates of the exact best-k items selection bounds. The codes can be found in our GitHub page. 9

In this subsection, we provide numerical simulations for our algorithms and those in related works under equal noise levels, i.e., we set p i,j = 0.6 for all items i and j with i j. This dataset has also been used in previous works (Yue and Joachims, 2011; Busa-Fekete et al., 2014; Falahatgar et al., 2017a; b; 2018) . The results are presented in Figure 1 , and every data point of it is averaged over 100 independent trials. , which is of the same order as TKS. Seq-Eliminate and Beat-the-Mean are also PAC best item selection algorithms, but their sample complexities are O(n −2 log(n/δ)), higher than that of TKS by a log factor. Active Ranking (Heckel et al., 2019) and MallowsMPI are exact selection algorithms with sample complexity O(n log(n/δ)).

The numerical results are summarized in Figure 1 (a) (b). We set δ = 0.01, and examine how the number of comparisons conducted increases with n. In Figure 1 (a), we set = 0.08, and in Figure 1 (b), we set = 0.001.

According to the illustrated results, we can see that when is small (i.e., = 0.001), the performance of our algorithm TKS is almost the same as those of Knockout and MallowsMPI, the best of previous works. We note that Knockout and MallowsMPI are only designed for best item selection and it is not obvious how to extend them to cases with k > 1. Thus, although our TKS works for all values of k, its performance is close to the best of the state-of-the-art when k = 1.

For the PAC k-selection, we provide the simulation results for EQS, TKS, and Active Ranking.

The results are summarized in Figure 1 As presented in Figure 1 (c)-(e), we can see that when k is small (i.e., k ≤ 2), TKS outperforms EQS, but when k is not too small, EQS uses fewer comparisons. The sample complexity upper bound of TKS is O(n −2 log(k/δ)), which is lower than the O(n −2 log(n/δ)) complexity of EQS. However, in practice, for most values of k, EQS consumes fewer comparisons. One explanation is that the constant factor of TKS is larger than that of EQS. There may be two reasons: First, in each call of EQS on S, the sub-call of EQS is executed on S up or S down , whose expected sizes are less than |S|/2, while in TKS, each iteration removes no more than a half of the items. Second, in TKS, the value t input to DI is less than , which is used in EQS.

For the exact k-selection algorithm, we only provide numerical results for the algorithms proposed in this paper: SEEBS, SEEKS, and SEEKS-v2, a variation of SEEKS.

Here, SEEKS-v2 is almost the same as SEEKS. But in Line 4, TKS is replaced with EQS, since EQS has a better empirical performance than TKS when k is not too small. We note that the sample complexity upper bound of SEEKS-v2 is of the the same order as SEEKS (ignoring constant factors). We do not compare the algorithm proposed by Chen et al. (2018) because it is unclear how to choose the parameters to let the confidence be 1−δ. We do not compare the algorithm given by Saha and Gopalan (2019b) since it requires the system to be able to conduct comparisons over more than two items, which is not assumed in this paper.

In Figure 1 (f), we compare SEEBS, SEEKS, and SEEKS-v2 with k = 1 and δ = 0.01. In Figure 1 (g), we fix k = 50 and δ = 0.01, vary n, and compare the two versions of SEEKS. In Figure 1 (h), we fix n = 1000 and δ = 0.01, vary k, and compare the two versions of SEEKS.

From Figure 1 (f), we can see that SEEBS is slightly better than SEEKS, which is due to the choices of confidences input to the calls of DI in these two algorithms. Also, we can see that SEEBS and SEEKS are better than SEEKS-v2, especially when n is large. This is because the empirical performance of EQS is worse than TKS when k = 1. According to Figure 1 (g) and (h), SEEKS-v2 consumes fewer comparisons when k is not too small. An explanation is that in practice, EQS uses fewer comparisons than TKS when k is not too small.

In this subsection, we perform numerical experiments on public election datasets provided in PrefLib (Mattei and Walsh, 2013) . To be specific, we use the Irish election dataset "ED-00001-00000001.pwg" (Lu and Boutilier, 2011) and the clean web search dataset "ED-00015-00000047.pwg" (Betzler et al., 2014) . Both datasets can be found in PrefLib.org.

The Irish Election dataset contains n = 12 candidates and 43,942 votes on them. The web search dataset contains n = 28 pages and 1134 samples of pairwise preferences on them. For every pair of items i and j in each dataset, the dataset records the number of votes or samples N i,j that show preference on item i to item j. From these records, we extract p i,j := N i,j /(N i,j + N j,i ) for any two items i and j. We note that these two dataset do not satisfy the SST or the STI and do not imply a strict order. Thus, we use the Borda-Scores for them to get the true rankings.

In the experiments, we set = 0.001, δ = 0.01, and k = {1, 4}. Surprisingly, although these two datasets do not satisfy SST or STI, our algorithms EQS, TKS, SEEBS, and SEEKS can still return correct results with correct probability at least 1 − δ (in the experiments, all runs of them return correct results). In fact, we have done experiments on more datasets and find that if there is a small number γ > 1 (e.g., γ < 5) such that for any i j k, p i,k ≥ γ −1 max{p i,j , p j,k } and ∆ i,k ≤ γ(∆ i,j + ∆ j,k ), then our algorithms can guarantee at least 1 − δ correct probability.

From the results presented in Figure 2 , we can see that for the Irish election dataset, the performances of our algorithms EQS and TKS are close to the best of the previous works, which indicates that even if they are not designed for k = 1 and these types of datasets, they still have promising performances on some real-world datasets. The results also show positive evidence on our theoretical results, i.e., TKS (SEEKS) performs better than EQS (SEEKS-v2) when k is small (k = 1) and performs worse when k is large (k = 4).

This paper studied the sample complexity bounds for selecting the PAC or exact best-k items from pairwise comparisons. For PAC k-selection, we first proved an Ω(n −2 log(k/δ)) lower bound, and then proposed an algorithm with expected sample complexity O(n −2 log(k/δ)), which implies that both our upper bound and lower bound are tight up to a constant factor. For exact k-selection, we first proved a worst-instance lower bound, and then proposed an algorithm for k = 1 that is optimal up to a loglog factor. Finally, we proposed an algorithm for general kvalues that is optimal up to a log factor. The numerical results in this paper also confirm our theoretical results. 

In this subsection, we provide numerical results of our algorithms and related previous works on another synthetic dataset, where the noise levels are not equal. To be specific, for any two items i and j with i j, the value of p i,j is independently randomly drawn from the Uniform(0.5∆, 1.5∆) distribution, where ∆ = 0.1. The results are presented in Figure 3 . Every data point in every figure is averaged over 100 independent trials.

Other than the dataset, the experiment setup and involved algorithms are the same as that in Section 5.1. From the results in Figure 3 , we can see that the performances of the algorithms are similar to that presented in Section 5.1. We omit the detailed descriptions for brevity.

In this subsection, we use figures to illustrate the growth rates of the exact best-k selection bounds. The lower bound refers to that of the Thurstone's model stated in Theorem 7, i.e., Ω( i∈[n] ∆ −2 i log δ −1 )+Ω(∆ −2 r k log log ∆ −1 r k ); the upper bound for k = 1 refers to that of SEEBS stated in Theorem 8, i.e., O( i =r1 ∆ −2 i (log δ −1 + log log ∆ −1 i )); and the upper bound for k > 1 refers to that of SEEKS stated in Theorem 9, i.e., O( i∈[n] ∆ −2 i (log(n/δ) + log log ∆ −1 i )). In this subsection, we ignore the constant factors (the constant factors are also unclear) and show the growth rates of these bounds.

We fix k = 1, vary n from 10 to 1000, and set ∆ i = ∆ for all items. The results are illustrated in Figure 4 . In Figure 4 (a), we set ∆ = 0.1 and δ = 0.01, in Figure 4 In all subfigures of Figure 4 , we can see that the upper bound for k > 1 is always larger and grows faster than the upper bound for k = 1 and the lower bound. This is because the upper bound for k > 1 depends on log(n/δ) while the other two bound depend on log δ −1 .

From Figure 4 (a) and (b), we can see that the upper bound for k = 1 is larger than the lower bound and this gap is larger for smaller values of ∆, which is because the upper bound depends on ∆ −2 n log log ∆ −1 while the lower bound depends on ∆ −2 (n + log log ∆ −1 ). Another finding is that the growth rates of these two bounds have no obvious difference. The reason is that the terms multiplied to n in these two bounds are ∆ −2 and ∆ −2 log log ∆ −1 , respectively, which are extremely close even for large values of ∆. Figure 4 (c) and (d), we see that when δ is small, especially when δ is far smaller than ∆, the gap between the upper bound for k = 1 and the lower bound is close to zero. The reason is that when δ is small, the terms n log log ∆ −1 or log log ∆ −1 are both dominated by log δ −1 . From a mathematical perspective, log δ −1 is exponentially higher than log log ∆ −1 , which implies that when δ and ∆ both approach zero with comparable rates, the influence of the log log ∆ −1 term will vanish compared to the log δ −1 term.

Theorem 2 (Lower bound for PAC k-selection). Given ∈ (0, 1/128), δ ∈ (0, e −4 /4), n ≥ 2, and 1 ≤ k ≤ n/2, there is an n-sized instance satisfying SST and STI such that to find an ( , k)-optimal subset of [n] with probability 1 − δ, any algorithm needs to conduct Ω(n −2 log(k/δ)) number of comparisons in expectation.

Proof of Theorem 2. A possible way to prove this lower bound is by reducing the pure exploration multi-armed bandit (PEMAB) problem (e.g., (Mannor and Tsitsiklis, 2004)) to the k-selection problem under the MNL model, which has been adopted by Ren et al. (2018; 2019) ; Saha and Gopalan (2019a). We note that the definition of PAC best-k items given by Ren et al. (2018) is different from that in this paper, and thus, we need to independently find a lower bound in this paper. We first show the reduction procedure given by Ren et al. (2019) , then show how to reduce the PEMAB problem to the best-k items selection problem, and finally prove the lower bound by invoking the results shown by Mannor and Tsitsiklis (2004) Step 1 is to introduce the PEMAB problem with Bernoulli arms as well as the MNL model. In the PEMAB problem with Bernoulli arms, there are n arms denoted by a 1 , a 2 , ..., a n . For each arm a i , it holds a real number µ i ∈ [1/4, 3/4] denoting its mean reward. The t-th sample of arm a i returns an independent random reward R t i according to the Bernoulli(µ i ) distribution. We further assume that (R t i , i ∈ [n], t ∈ Z + ) are independent. For positive integer k with k ≤ n, we use µ [k] to denote the k-largest mean reward of these n Bernoulli arms.

Given k ∈ {1, 2, 3, ..., n/2 }, δ ∈ (0, 1/2) and ∈ (0, 1/2), the PAC PEMAB problem is to find k distinct arms with mean rewards no less than µ [k] − by adaptively sampling the arms, where the error probability is no more than δ.

Under the MNL model, each item i is assumed to hold a real number γ i representing the users' preference of this item. The larger the number, the more preferred this item. For any two items i and j, a comparison over them returns item i with probability p i,j = e γi /(e γi + e γj ), and returns item j with probability p j,i = e γj /(e γi + e γj ). To simplify the notation, for any item i, we define θ i = exp(γ i ), and name θ i as the preference score of item i. Thus, for any two items i and j, we have p i,j = θ i /(θ i + θ j ).

Step 2 is to introduce the reduction procedure. To do the reduction, we introduce Procedure P 1 , which is described in Procedure 6.

Claim 18 proved by Ren et al. (2019) states that Procedure P 1 returns arm a i with probability µ i /(µ i + µ j ), and returns arm a j with probability µ j /(µ i + µ j ).

Let A be a PAC best-k items selection algorithm. Now for each arm a i , we create an artificial item i, and input items 1, 2, 3, ..., n to Algorithm A. Whenever Algorithm A wants Procedure 6 P 1 (a i , a j ) (Ren et al., 2019) Input: Two Bernoulli arms a i and a j with unknown mean rewards µ i and µ j , respectively; 1: repeat 2:

Randomly choose an arm a X and sample it;

3:

Let s ← the sample result; 4: until s = 1 5: return a X ; to compare artificial items i and j, we call Procedure P 1 on arms a i and a j . If Procedure P 1 returns a i , then we tell Algorithm A that i wins this comparison. Otherwise, we tell Algorithm A that j wins this comparison. Observe that the probabilities that Procedure P 1 returns an arm a i is µ i /(µ i + µ j ), which is of the same formula as the comparison probabilities under the MNL model. Thus, if for n items with preference scores θ 1 = µ 1 , θ 2 = µ 2 , ..., θ n = µ n , Algorithm A can find k distinct items with preference scores no less than µ [k] − with probability at least 1 − δ by conducting M comparisons, there exists an algorithm that solves the above PEMAB problem by calling Procedure P 1 for M times without additional samples of these n arms.

Since for any arm a i , the mean reward µ i is in [1/4, 3/4], any call of Procedure P 1 returns after at most 4 samples in expectation. Thus, by substituting the comparisons in Algorithm A with Procedure P 1 , one can solve the PEMAB problem by 4M samples of arms in expectation.

Step 3 is to prove a related lower bound for the PAC kselection problem.

For k = 1, < 1/8, and δ < e −4 /4, Mannor and Tsitsiklis (2004) proved that there is an instance such that to solve the PEMAB problem, at least Ω(n −2 log δ −1 ) number of comparisons are needed in expectation. For 6 ≤ k ≤ n/2, ≤ 1/32, and δ ≤ 1/4, Kalyanakrishnan et al. (2012) proved that there is an instance such that to solve the PEMAB problem, at least Ω(n −2 log(k/δ)) number of comparisons are needed in expectation.

For 2 ≤ k ≤ 5 with additional knowledge about (k − 1) arms with mean rewards no less than µ [k] − , to solve the PEMAB problem with k > 1 and n arms is equivalent to solve the PEMAB problem with k = 1 and (n − k + 1) arms. Thus, the expected sample complexity of any algorithm is lower bounded by Ω((n−k+1) −2 log δ −1 ) = Ω(n −2 log(k/δ)).

Thus, for 1 ≤ k ≤ n/2, < 1/8, and δ < e −4 /4, we conclude that 4M = Ω(n −2 log(k/δ)), i.e., there is an instance such that to find k distinct items with preference scores no less than µ [k] − , any algorithm needs to conduct Ω(n −2 log(k/δ)) number of comparisons in expectation.

Step 4 is to conclude the lower bound for PAC k-selection. We assume that Algorithm A can find an ( , k)-optimal subset of [n] with probability at least 1 − δ by conducting o(n −2 log(k/δ)) number of comparisons in expectation, and we will show a contradiction to Step 3 to complete the proof of the desired lower bound.

Let R be an ( , k)-optimal subset of [n] returned by Algorithm A. Let item i be an item in R and item j be an item in [n] − R. By the definition of ( , k)-optimality, we have p i,j ≥ 1/2 − . Let r k be the item with the k-th largest preference score. If R is the set of the best-k items, then for any item i in R, we have i r k , i.e., p i,r k ≥ 1/2. If R is not the set of the best-k items, then there exists an item j not in R such that j r k , and thus, for any item i in R, p i,r k ≥ p i,j ≥ 1/2 − . Hence, in any case, for any item i in R, p i,r k ≥ 1/2 − .

Thus, every item i in R has µ i ≥ µ [k] − 4 . This indicates that Algorithm A can find k distinct items with preferences no less than µ [k] − 4 by conducting o(n −2 log(k/δ)) number of comparisons in expectation.

Step 3, we have shown that for < 1/128, to find k distinct items with preference scores no less than µ [k] −4 , at least Ω(n −2 log(k/δ)) number of comparisons in expectation are needed, which leads to a contradiction to the assumption. Hence, Algorithm A with sample complexity o(n −2 log(k/δ)) assumed in this step does not exist. This completes the proof of Theorem 2.

Lemma 3 (Theoretical Performance of DI). DI terminates after at most O( −2 log δ −1 ) comparisons, and with probability at least 1 − δ, one the following five events happens: (i) p i,v ≥ 1/2 + + s u and item i is added to S up ; (ii) p i,v ∈ (1/2 + s u , 1/2 + + s u ) and item i is not added to

Proof of Lemma 3. DI terminates after at most t max = 2 −2 log(4/δ) comparisons, and the sample complexity follows from the choice of t max . Now we focus on the proof of the correctness, i.e., with probability at least 1 − δ, one of the five stated events happens.

For any t ∈ Z + , we define a bad event that we do not want to happen,

By the Chernoff-Hoeffding inequality (Hoeffding, 1994) , we have that for all t in Z + ,

We define another bad event

whose probability, by Chernoff-Hoeffding inequality, is upper bounded by

Thus, by the union bound, the probability that some bad event happens is at most

In the rest of the proof, we assume that no bad event happens, which has probability at least 1 − δ. We split the rest of our proof in five cases, each for an event.

Case 1: p i,v ≥ 1/2 + + s u . Since none of E t happens, for any round t, we have w t /t > p i,v −b t ≥ 1/2−b t −s d , which implies that item i will not be added to S down by Line 9. If DI proceeds to Line 12, since E out does not happen, we will have w tmax /t max > p i,v − /2 ≥ 1/2 + /2 + s u , which implies that item i will be added to S up by Line 13.

Case 2: p i,v ∈ (1/2 + s u , 1/2 + + s u ). Since none of E t happens, for any round t, we have w t /t > p i,v − b t > 1/2 − b t − s d , which implies that item i will not be added to S down by Line 9. If DI proceeds to Line 12, since E out does not happen, we will have w tmax /t max > p i,v − /2 > 1/2 − /2 − s d , which implies that item i will not be added to S down by Line 15.

Since none of E t happens, for any round t, we have |w

Thus, item i will not be added to S up or S down by Lines 7 or 9. If DI proceeds to Line 12, since E out does not happen, we will have w tmax /t max < p i,v + /2 ≤ 1/2 + /2 + s u and w tmax /t max > p i,v − /2 ≥ 1/2 − /2 − s d . Thus, item i will not be added to S up or S down by Lines 13 or 15. Therefore, item i will be added to S mid .

Case 4: p i,v ∈ (1/2 − − s d , 1/2 − s d ). Since none of E t happens, for any round t, we have w t /t < p i,v + b t < 1/2 + b t + s u , which implies that item i will not be added to S up by Line 7. If DI proceeds to Line 12, since E out does not happen, we will have w tmax /t max < p i,v + /2 < 1/2 + /2 + s u , which implies that item i will not be added to S up by Line 13.

Case 5: p i,v ≤ 1/2 − − s d . Since none of E t happens, for any round t, we have w t /t < p i,v + b t ≤ 1/2 + b t + s u , which implies that item i will not be added to S up by Line 7.

If DI proceeds to Line 12, since E out does not happen, we will have w tmax /t max < p i,v + /2 ≤ 1/2− /2−s d , which implies that item i will be added to S down by Line 15.

The correctness follows from the above five cases, and the proof of Lemma 3 is complete.

Theorem 4 (Theoretical Performance of EQS). Given an input set S with |S| = n, 1 ≤ k ≤ n/2, and , δ ∈ (0, 1/2), EQS(S, k, , δ) terminates after O(n −2 log(n/δ)) number of comparisons in expectation, and with probability at least 1 − δ, returns an ( , k)-optimal subset of S.

Proof of Theorem 4. The proof consists of two parts: the proof of the correctness and the proof of the sample complexity. To avoid ambiguity, we use EQS to denote the algorithm and subEQS to denote the EQS function called by the algorithm.

Let E be the event that all calls of DI return correct results, i.e., for each call of DI, one of the five events stated in Lemma 3 happens. By Lemma 3 and the union bound, E happens with probability at least 1 − δ/n.

Proof of the correctness. We prove the correctness by induction. First let n = 1. In this case, k must be one. Since the only item is chosen as the pivot, and the pivot is added to S mid , EQS simply returns {1} as the answer, which is correct with probability 1. Thus, when n = 1, EQS returns an ( , 1)-optimal subset of S with probability 1.

Now we consider the case where n > 1. We make the following hypothesis to prove the correctness by induction.

Hypothesis 1. For all sets S with size less than n, k ∈ {1, 2, ..., |S |}, and δ ∈ (0, δ], EQS(S , k , , δ ) returns an ( , k )-optimal subset of S with probability at least 1 − δ .

We note that when n = 1, EQS returns an ( , 1)-optimal subset of S with probability 1, and thus, Hypothesis 1 holds for n = 2.

From now on till the end of the proof of the correctness, we assume that E happens and subEQS (i.e., the EQS called by the algorithm) also returns a correct result. We have shown that P{E} ≥ 1 − δ/n, and Hypothesis 1 claims that subEQS returns a correct result with probability at least 1−(n−1)δ/n. Thus, this assumption holds with probability at least 1 − δ.

First, we show a property about the sets S up , S mid , and S down . Since E happens, according to Lemma 3, all items i added to S up have i v, all items i added to S mid have p i,v ∈ (1/2 − /2, 1/2 + /2), and all items i added to S down have v i. Here we note that v is a pivot randomly picked from S. Now, let item i in S up ∪ S mid and item j in S mid ∪ S down be given. There are four cases about items i and j.

Case 1: item i is in S up and item j is in S mid . Since i v,

Case 2: item i is in S up and item j is in S down . In this case, we have i v j, which implies that p i,j > 1/2 > 1/2 − .

Case 3: item i is in S mid and item j is in S mid . By the definition of STI, we have

Thus, from the above four cases, we conclude that for any item i in S up ∪ S mid and j in S mid ∪ S down , p i,j ≥ 1/2 − .

Next, we finish the proof of the correctness by analyzing the following three cases. Let R be the returned set of EQS. Let i be an item in R and j be an item not in R.

Case 1: |S up | > k. In this case, item i is in S up . If j is in S up , by Hypothesis 1, the set returned by subEQS is an ( , k)-optimal subset of S up , and thus, p i,j ≥ 1/2 − . For the case where j is in S mid ∪ S down , we have shown that p i,j ≥ 1/2 − .

Case 2: |S up | ≤ k and |S up | + |S mid | ≥ k. In this case, we have that i is in S up ∪ S mid and j is in S mid ∪ S down . We have shown that p i,j ≥ 1/2 − .

Case 3: |S up | + |S mid | < k. In this case, j is in S down . For the case where i is in S up ∪ S mid , we have shown that p i,j ≥ 1/2− . If i is in S down , then i is in the returned set of subEQS, which by Hypothesis 1 implies that p i,j ≥ 1/2 − . Therefore, if Hypothesis 1 holds for n, EQS returns a correct ( , k)-optimal subset of S with probability at least 1 − δ.

Since k ≤ n and δ < 1/2 are arbitrary, Hypothesis 1 holds for n + 1. Also, since Hypothesis 1 holds for n = 2, Hypothesis 1 holds for all n ≥ 2. This completes the proof of the correctness.

Proof of the sample complexity. We prove the sample complexity by induction. Let c 1 > 0 be the hidden constant of the sample complexity of DI stated in Lemma 3. For any positive integer n 1 , we use T (n 1 , k 1 , , δ 1 ) to denote the upper bound of the expected number of comparisons conducted by the call of EQS([n 1 ], k 1 , , δ 1 ), where [n 1 ] denotes an arbitrary set consisting of n 1 items, k 1 is a positive integer with k 1 ≤ min{n 1 , k}, and δ 1 is in (0, δ].

When there is only one item, we have T (1, k 1 , , δ 1 ) = 0, as we do not need to conduct any comparison. When there are two items, since we only need to compare the two items in the call of DI, we have T (2, k 1 , , δ 1 ) ≤ c 1 −2 log δ −1 for any k 1 ≤ min{2, k} and δ 1 ∈ (0, δ]. Now we let n 1 > 2, k 1 ≤ min{n 1 , k}, and δ 1 ∈ (0, δ] be given, we make the following hypothesis. Note that we have shown that when n 1 = 3, Hypothesis 2 holds.

Hypothesis 2. For all n 2 < n 1 , k 2 ≤ min{n 2 , k 1 }, and δ 2 ∈ (0, δ 1 ], T (n 2 , k 2 , , δ 2 ) ≤ c 2 n 2 −2 log(n 2 /δ 2 ), where c 2 > 0 is a sufficiently large constant.

For the call of EQS([n 1 ], k 1 , , δ 1 ), we use v to denote its pivot, and use l to denote the rank of item v in [n 1 ], i.e., item v ranks the l-th best in [n 1 ]. Since the pivot v is picked at random, l is uniformly distributed on [n 1 ].

We recall that E is the event that all DIs called by EQS([n 1 ], k 1 , , δ 1 ) return correct results, i.e., for each call of DI, one of the five events stated in Lemma 3 happens. By Lemma 3, E happens with probability at least 1 − δ/n 1 .

First, we consider the case where E does not happen. In this case, since v is added to S mid , we have |S up | ≤ n 1 − 1 and |S down | ≤ n 1 − 1, and subEQS (if existing) will only be executed on one of S up and S down . Hence, in this case, the expected number of comparison conducted by EQS([n 1 ], k 1 , , δ 1 ) is

T (n 1 − 1, k , , δ 1 ) + c 1 (n 1 − 1) 2 log n 1 δ 1 ≤ (c 2 + c 1 ) · n 1 2 log n 1 δ 1 .

Next, we consider the case where E happens. In this case, since Lemma 3 states that no item less preferred than the pivot v will be added to S up and no item more preferred than the pivot v will be added to S down , we have |S up | ≤ l − 1 and |S down | ≤ n 1 − l. If l > k 1 , then we have |S up | + |S mid | = n 1 − |S down | ≥ l > k 1 , which implies that subEQS (if existing) will only be executed on the set S up , and the size of S up is no more than (l − 1). If l ≤ k 1 , then we have |S up | ≤ l ≤ k 1 , which implies that subEQS (if existing) will only be executed on S down , and the size of S down is at most (n 1 − l). Hence, when E happens, the expected number of comparisons conducted by the call of EQS([n 1 ], k 1 , , δ 1 ) is

Summarizing the numbers of comparisons in these two cases, by P{E} ≥ 1 − δ 1 /n 1 , n 1 > 2, and δ 1 < 1/2, we get

Choose c 2 ≥ 4.8c 1 , and then we have T (n 1 , k 1 , , δ 1 ) ≤ c 2 n 1 −2 log(n 1 /δ 1 ). Thus, if Hypothesis 2 holds for n 1 , it will hold for n 1 + 1. We also recall that when n 1 ≤ 3, Hypothesis 2 holds. Therefore, by induction, Hypothesis 2 holds for all values of n 1 . Hence, EQS([n], k, , δ) terminates after at most c 2 n −2 log(n/δ) number of comparisons in expectation. This completes the proof of the sample complexity, and the proof of Theorem 4 is complete.

Theorem 5 (Theoretical Performance of TKS). Given input 1 ≤ k ≤ n/2, and , δ ∈ (0, 1/2), TKS terminates after O(n −2 log(k/δ)) number of comparisons in expectation, and with probability at least 1 − δ, returns an ( , k)-optimal subset of [n].

Proof of Theorem 5. We first prove the correctness of TKS and then prove its sample complexity. Here, we let T be the number of rounds, and thus, the returned set is R T +1 .

Proof of the correctness.

Step 1 is to prove that for any round t, R t+1 contains an ( t , k)-optimal subset of R t . Let b 1 , b 2 , ..., b k be the best-k items of R t , and denote B = {b 1 , b 2 , ..., b k }. Also, for all l ∈ [k], we use S t,s l to denote the split set that contains b l .

We let E t good be the event that for all l ∈ [k], the calls of EQS on S t,s l return correct results. By Theorem 4 and the union bound, we have P{E t good } ≥ 1 − δ t . During the proof of the correctness, we assume that E t good happens for all t, and by the union bound, we have that

We complete

Step 1 by constructing a subset U ⊂ R t+1 that is an ( t , k)-optimal subset of R t . The construction consists of stages. We note that we only need to prove the existence of such a set, and thus, in the construction, we have the oracle knowledge about the values of b 1 , b 2 , ..., b k .

Stage 0: Let U be the empty set.

Stage 1: If b 1 is not in A t,s1 , then by Theorem 4, all items i in A t,s1 have p i,b1 ≥ 1/2 − t . By the definition of SST, this implies that p i,j ≥ 1/2 − t for all items j in R t . In this case, we let U = A t,s1 , which is an ( t , k)-optimal subset of R t , and the construction of U is complete. If b 1 is in A t,s1 , then we add b 1 to U and the construction proceeds to Stage 2.

Stage l for any l in {2, 3, ..., k}: We hypothesize that either (i) the construction has ended at an earlier stage, or (ii) b 1 ∈ A t,s1 ,b 2 ∈ A t,s2 ,..., b l−1 ∈ A t,s l−1 , and U = {b 1 , b 2 , ..., b l−1 }. Now we assume that the construction has not ended, otherwise we skip this stage. If b l is not in A t,s l , then by the property of EQS stated in Theorem 4 and the definition of SST, in A t,s l − {b 1 , b 2 , ..., b l−1 }, there are at least |A t,s l | − l + 1 = k − l + 1 items i such that for all items j in R t − {b 1 , b 2 , ..., b l−1 }, we have p i,j ≥ p i,b l ≥ 1/2− t . In this case, we add these (k −l +1) items to U . Then U is an ( t , k)-optimal subset of R t , and the construction of U is complete. If b l is not in A t,s l−1 , then we add b l to U , and the construction proceeds to Stage (l + 1).

Stage 1 does not require any hypothesis, and after each Stage l for l in [k − 1], the hypothesis required by Stage (l + 1) is satisfied. Also, each stage adds at least one item to U . Hence, the construction completes after at most k stages. From the above induction, we have that U is an ( t , k)optimal subset of R t . Thus, for any t, given that E t happens, R t+1 contains an ( t , k)-optimal subset of R t .

Step 2 is to finish the proof of the correctness.

Step 1 has shown that for each t ∈ [T ], there exists a set U t+1 ⊂ R t+1 such that U t+1 is an ( t , k)-optimal subset of R t . Recall that T is the last round, and the loop ends only when |R t | reaches k. Thus, |R T +1 | = k, and U T +1 = R T +1 .

Let t > 1 be given, and let u t+1 be an item in U t+1 . If u t+1 is in U t , then we let u t = u t+1 , which implies that p ut+1,ut = 1/2 ≥ 1/2 − t . If u t+1 is not in U t , then by that fact that |U t | = |U t+1 |, U t − U t+1 contains at least one item, and we denote this item by u t . By

Step 1, u t has p ut+1,ut ≥ 1/2 − t . Thus, in both cases, we have p ut+1,ut ≥ 1/2 − t .

Let i be an item in R T +1 and j be an item in [n] − R T +1 . Since j ∈ [n] = R 1 and j / ∈ R T +1 , there is an r such that j ∈ R r and j / ∈ R r+1 . We use u T +1 to denote i. By the above paragraph, there exists a sequence of items u T ∈ U T ,u T −1 ∈ U T −1 ,...,u r+1 ∈ U r+1 such that p ut+1,ut ≥ 1/2 − t for all t in {T, T − 1, T − 2, ..., r + 1}. Also, since item j is in R r but not in R r+1 , we have p ur+1,j ≥ 1/2− r . By this sequence, we conclude that

Thus, when E t good happens for all t, the returned set of EQS is an ( , k)-optimal subset of [n]. By Eq. (3), the joint event ∩ T t=1 E t good happens with probability at least 1 − δ. This completes the proof of the correctness.

Proof of the sample complexity. At each round t, there are |R t |/m calls of EQS. Each call of EQS involves at most 2k items with parameters k (or less), t , and δ t , and returns at most k items. Thus, we have |R t+1 | = |R t |/(2k) k. If |R t | ≤ n/(2 t−1 k) k, then we have |R t+1 | ≤ n/(2 t−1 k) /2 k ≤ n/(2 t k) k. Also, we have |R 1 | = n ≤ n/k k, and thus, by induction, for any t,

where c 3 > 0 is some universal constant. By the fact that |R t | ≥ k, we also get that the number of EQS called by round t is at most

Let c 4 > 0 be the hidden constant factor in the sample complexity stated in Theorem 4.We conclude that the expected number of comparisons conducted by TKS is

This completes the proof of the sample complexity, and the proof of Theorem 5 is complete.

Theorem 7 (Lower bound for exact k-selection under Thurstone's model). Under Thurstone's model with variance one, given δ ∈ (0, 1/100), n items with scores θ 1 , θ 2 , ..., θ n ∈ [0, 1], and 1 ≤ k ≤ n/2, to find the best-k items with probability at least 1 − δ, any algorithm must conduct at least Ω( i∈[n] ∆ −2 i log δ −1 ) +Ω(∆ −2 r k log log ∆ −1 r k ) number of comparisons in expectation.

Proof of Theorem 7. In this proof, we reduce the PEMAB problem (Jamieson et al., 2014; Chen et al., 2017) with Gaussian arms to the exact k-selection problem under Thurstone's model, and use the lower bound of PEMAB proved by Jamieson et al. (2014); Chen et al. (2017) to prove the desired lower bound for exact k-selection.

In the PEMAB problem with Gaussian(0, 1) noises, there are n arms denoted by a 1 , a 2 , ..., a n . Each arm a i holds a real number µ i denoting the mean reward of arm a i . The t-th sample of arm a i returns a random value R t i = µ i + Z t i as the reward, where Z t i is a Gaussian random variable with mean 0 and variances 1. We further assume that (Z t i , a i ∈ [n], t ∈ Z + ) are independent.

We first consider the case where k = 1. Let a 1 , a 2 , ..., a n be n arms with noises following Gaussian distributions such that the mean reward of arm a i is µ i = θ i and the variances are all 1. Let µ * be the largest mean reward and µ be the second largest mean reward. To find the arm with the largest mean reward with probability at least 1−δ, Chen and Li (2015) proved that Ω( ai =ar 1 |µ * − µ i | −2 log δ −1 ) + Ω(|µ * − µ | −2 log log |µ * − µ | −1 ) number of pulls of arms are needed in expectation.

To reduce the PEMAB problem to exact k-selection, we develop a Procedure P 2 , which is descried in Procedure 7.

Algorithm 7 P 2 (a i , a j ) Input: Two Gaussian arms a i and a j with unknown mean rewards µ i and µ j , respectively; 1: Sample arm a i and let R i be the reward; 2: Sample arm a j and let R j be the reward; 3: if R i > R j then 4:

return arm a i ; 5: else 6: return arm a j ; 7: end if

The probability that Procedure P 2 (a i , a j ) returns arm a i is

Thus, given n items with scores θ i = µ i for all items i, the probability that Procedure P 2 (a i , a j ) returns arms is exactly the same as the probability that a comparison between items i and j returns items.

Also, since for any arm a i , µ i = θ i ∈ [0, 1], we have that for every two arms a i and a j ,

Let A be an exact k-selection algorithm. Now, for each arm a i , we create an artificial item i, and input these n artificial items into Algorithm A. Whenever Algorithm A wants to compare two artificial items i and j, we call Procedure P 2 (a i , a j ) to mimic the comparison, i.e., if Pro-cedure P 2 (a i , a j ) returns arm a i (a j ), then we tell Algorithm A that artificial item i (j) wins this comparison. Since the probabilities with which Procedure P 2 (·, ·) return arms are the same as the comparison probabilities under Thurstone's model, Algorithm A does not notice anything strange. Thus, if Algorithm A can find the best item in [n] with probability 1 − δ by M number of comparisons in expectation, one can find the best arm with probability 1 − δ by pulling 2M number of arms in expectation.

Thus, we conclude that for the exact best item selection problem, the lower bound is Ω( ai =ar 1 |µ * − µ i | −2 log δ −1 ) + Ω(|µ * − µ | −2 log log |µ * − µ | −1 ). By the definition of ∆ i 's, θ i = µ i , and Eq. (4), we have that for any artificial item i = r 1 , ∆ i ≥ (e −1/4 / √ 4π)|µ * − µ i | and ∆ r1 = ∆ r2 . Thus, the lower bound for best item selection is Ω( i∈[n] ∆ −2 i log δ −1 ) +Ω(∆ −2 r2 log log ∆ −1 r2 ). Next, we consider the case where k > 1. Let A k,1 be an algorithm which a priori knows the best (k − 1) items, and thus, for Algorithm A k,1 , the problem of finding the best-k items is the same as finding the best item of {r k , r k+1 , r k+2 , ..., r n }. Thus, the expected number of comparisons conducted by any best-k items selection algorithm is lower bounded by Ω( n i=k ∆ −2 ri log δ −1 ) + Ω(∆ −2 r k log log ∆ −1 r k ). Similarly, let Algorithm A k,2 be an algorithm which a priori knows the worst (n − k − 1) items, and thus, for Algorithm A k,2 , the problem of finding the best-k items is the same as finding the worst item of {r 1 , r 2 , ..., r k , r k+1 }. Since the lower bound for finding the worst item is of the same form as finding the best item (where the definition of the gaps vary accordingly), the expected number of comparisons conducted by any best-k items selection algorithm is lower bounded by Ω( k+1 i=1 ∆ −2 ri log δ −1 ) + Ω(∆ −2 r k log log ∆ −1 r k ). Combine these two lower bounds, and the proof of Theorem 7 is complete.

Theorem 8 (Theoretical Performance of SEEBS). With probability at least 1 − δ, SEEBS terminates after

number of comparisons in expectation and returns the best item in [n].

Proof of Theorem 8. Notations. We use round t to denote the t-th iteration of Lines 3 to 11. For any item i, we use T i to denote the index of the round when i is discarded (i.e., the round when i is added to S down and not added to R t+1 ). Assume that the unknown true order of these n items is r 1 r 2 · · · r n , and r 1 is the best item in [n] . Use T to denote the index of the last round. The proof consists of two parts, the proof of the correctness and the proof of the sample complexity.

Proof of the correctness, i.e., to prove that if SEEBS returns, then the returned item is r 1 , the best item in [n].

Hypothesis 3. For a round t, we hypothesize that with probability at least 1 − t−1 r=1 δ r , r 1 is in R t . Since R 1 = [n], we have that r 1 is in R 1 with probability 1. Hence, Hypothesis 3 holds for round one. Now, we let t ≥ 1 be given and assume that Hypothesis 3 holds for round t. By Theorem 5, with probability at least 1 − 2δ t /3, p vt,r1 ≥ 1/2 − α t /3. Then, since r 1 v t , i.e., p r1,vt ≥ 1/2, by Lemma 3, we have that given p vt,r1 ≥ 1/2 − α t /3, with probability at least 1 − δ t /3, item r 1 is not added to S down . Thus, if Hypothesis 3 holds for round t, with probability at least 1 − δ t , item r 1 is not discarded in round t, which implies that with probability at

Therefore, if Hypothesis 3 holds for round t, it will hold for round t + 1. Hypothesis 3 also holds for round one. By induction, Hypothesis 3 holds for all rounds t, i.e., with probability at least 1 − t−1 r=1 δ r , item r 1 is in R t . Since 1 − T r=1 δ r ≥ 1 − ∞ r=1 δ r = 1 − δ, with probability at least 1 − δ, r 1 is in R T +1 , i.e., the returned item is r 1 . This completes the proof of the correctness.

Proof of the sample complexity. In the proof of the sample complexity, we assume that the returned item is r 1 , which happens with probability at least 1 − δ. Since the algorithm terminates when there is only one item remaining, we have T ≤ max i =r1 T i .

Let N denote the number of comparisons conducted by SEEBS. In round t, the comparisons are conducted by the calls of TKS (Line 4) and DI (Line 7). By Theorem 5, the expected number of comparisons conducted by TKS is at most O(|R t |α −2 t log δ −1 t ). By Lemma 3, the expected number of comparisons conducted by each call of DI is at most O(α −2 t log δ −1 t ). Thus, in round t, the expected number of comparisons is at most O(|R t |α −2 t log δ −1 t ). Recall that for any item i, T i is the index of the round when item i is removed from R t or the loop ends. Also we have T ≤ max i =r1 T i . Thus, we get

where c 5 > 0 is a universal constant.

For any item i, define τ i := inf{t ∈ Z + : α t < ∆ i }, i.e., when t ≥ τ i , we have α t < ∆ i . Since α t = 2 −t , we have τ i ≤ 1 + log 2 ∆ −1 i . Now we show some probabilities about the values of T i 's.

Let item i in [n] − {r 1 } and t ≥ τ i be given. When t ≥ τ i , we have α t < ∆ i,r1 , i.e., p i,r1 = 1/2 − ∆ i,r1 < 1/2 − α t . We have shown that 1/2 − α t /3 ≤ p vt,r1 ≤ 1/2, which by the definitions of SST and STI implies that p i,vt ≤ 1/2 − (∆ i,r1 − ∆ r1,vt ) < 1/2 − 2α t /3. By Lemme 3, at round t, with probability at least 1 − δ t /3, item i is added to S down (i.e., item i is discarded). Thus, we have P{T i > t | T i > t − 1} ≤ δ t /3, which by δ t ≤ 1/2, implies that for any r in Z + , P{T i ≥ τ i + r} ≤ (5/6) · (1/6) r−1 = 5 · 6 −r .

Thus, by τ i ≤ 1 + log 2 ∆ −1 i,r1 = O(log ∆ −1 i,r1 ) and x + y ≤ 2xy when x, y ≥ 1, we have

5 · 6 −r · α −2 τi+r log δ −1 τi+r = τi t=1 4 t log π 2 t 2 6δ + ∞ r=1 5 · 6 −r · 4 τi+r log π 2 (τ i + r) 2 6δ 

where c 6 , c 7 , c 8 > 0 are three universal constants.

Thus, by Eq. (5) and Eq. (6), we conclude that

[∆ i (log(n/δ) + log log ∆ i )]), which completes the proof of the sample complexity, and the proof of Theorem 8 is complete.

Theorem 9 (Theoretical Performance of SEEKS). With probability at least 1 − δ, SEEKS terminates after O( i∈[n] [∆ −2 i (log(n/δ) + log log ∆ −1 i )]) number of comparisons in expectation, and returns the best-k items.

Proof of Theorem 9. Notations. We use round t to denote the t-th iteration of Lines 3 to 14. For any item i, we use T i to denote the index of the round when i is assured (i.e., the round when item i is added to S up or S down and not added to R t+1 ) and define T i := min{T, T i } as the index of the last round when item i is involved in some comparisons. We use T to denote the index of the last round. Assume that the unknown true order of these n items is r 1 r 2 · · · r n . Define U := {r 1 , r 2 , ..., r k } as the set of the best-k items, and U t := U ∩ R t .

Proof of the correctness, i.e., to prove that if SEEKS returns, then the returned set is U with probability at least 1 − δ. We prove the correctness by induction.

Hypothesis 4. Let t ≤ T + 1 be given. We hypothesize that S t ⊂ U ⊂ R t ∪ S t with probability at least 1 − t−1 r=1 δ r . When t = 1, we have R 1 = [n] and S 1 = ∅, which implies that S 1 = ∅ ⊂ U ⊂ [n] = R 1 with probability 1. Thus, Hypothesis 4 holds for t = 1. Now, we consider the case where t ≥ 2.

First, we bound an event. Let E t be the event that in round t, all the calls of TKS, TKS2, and DIs return correct results. By Theorem 5, Lemma 3, and the union bound, we have

In the proof of the correctness, we assume E t happens.

Second, we show a useful property of the pivot v t . In each iteration, items in S up are added to S t and k t is decreased by |S up |, and thus, k t = k − |S t |. By Hypothesis 4, we have S t ⊂ U ⊂ R t ∪ S t , U t ⊂ R t , and S t ∩ R t = ∅, and thus, U t = U − S t and |U t | = |U − S t | = k − |S t | = k t . By Theorem 5, for any item i in A t and j in (R t − A t ), we have p i,j ≥ 1/2 − α t /3. If U t = A t , then we have v t r k , which implies that p vt,r k ≥ 1/2 > 1/2 − α t /3. If U t = A t , then R t − A t contains some item v in U (which implies that v k), and thus, p vt,r k ≥ p vt,v ≥ 1/2 − α t /3. Thus, in both cases, we have p vt,r k ≥ 1/2 − α t /3.

For Line 5, we recall that TKS2 is almost the same as TKS with the only difference being that TKS2 is used for finding the PAC worst items. By Theorem 5, we have that for any item j in A t − {v t }, p vt,j ≤ 1/2 + α t /3. Since |A t | = |U t | = k t and A t ∩ U ⊂ R t ∩ U ⊂ U t , m t the worst item in A t has r k m t . Thus, p vt,r k ≤ p vt,mt ≤ 1/2 + α t /3. Therefore, we conclude 1/2 − /3 ≤ p vt,r k ≤ 1/2 + α t /3.

The third step is to show that in round t, S up ⊂ U t and S down ∩ U t = ∅. Let item i in U t be given. Since E t happens, the calls of DI on items i and j give correct results. Since item i is in U t , we have p i,r k ≥ 1/2, which by Eq. (7) implies that p i,vt ≥ 1/2 − α t /3. By Lemma 3, item i is not added to S down . Hence, no item in U t is added to S down , which implies S down ∩ U t = ∅.

Let item j in R t − U t be given. Since r k j, we have p r k ,j > 1/2, which implies that p j,r k ≤ 1/2 + α t /3. By Lemma 3, item j is not added to S up . Thus, no item in R t − U t is added to S up , which implies S up ⊂ U t .

Lastly, we show that Hypothesis 4 holds for all t. We have already proved that when Hypothesis 4 holds for t, with probability at least 1 − δ t (i.e., when E t happens), S up ⊂ U t and S down ∩ U t = ∅. By S up ⊂ U t and S t ⊂ U , we get S t+1 = S t ∪ S up ⊂ U.

By S down ∩ U t = ∅ and U ⊂ R t ∪ S t , we get

which implies that U t ⊂ S t+1 ∪ R t+1 . Hence,

Thus, we conclude that with probability at least 1 − t−1 r=1 δ r − δ t = 1 − t r=1 δ r , S t+1 ⊂ U ⊂ R t+1 ∪ S t+1 . This means that if Hypothesis 4 holds for t, then it holds for t + 1. It has also been shown that when t = 1, Hypothesis 4 holds. Thus, Hypothesis 4 holds for all t ≤ T + 1. Therefore, with probability at least

S T +1 ⊂ U ⊂ R T +1 ∪ S T +1 . Also, we have |R T +1 ∪ S T +1 | ≤ k. Thus, the returned set S T +1 ∪ R T +1 is exactly U . This completes the proof of the correctness.

Proof of the sample complexity. In the proof of the sample complexity, we assume that ∩ T t=1 E t happens. By Eq. (8), ∩ T t=1 E t happens with probability at least 1 − δ. Thus, with probability at least 1 − δ, all the calls of TKS, TKS2, and DI return correct results.

Learning with limited rounds of adaptivity: Coin tossing, multi-armed bandits, and ranking from pairwise comparisons

Theoretical and empirical evaluation of data reduction for exact kemeny rank aggregation

This work has been supported in part by NSF grants CAREER CNS-1943226, CNS-1901057, CNS-1758757, CNS-1719371, CNS-1717060, ECCS-1818791, and CCF-1758736, a Google Faculty Research Award, and an IITP grant (No. 2017-0-00692).We express our sincere gratitude to those who fought or are fighting against COVID-19.

Let N denote the number of comparisons conducted by SEEKS. In round t, the comparisons are conducted by the calls of TKS (Line 4), TKS2 (Line 5), and DI (Line 8). By Theorem 5, the expected number of comparisons conducted by TKS is at most O(|R t |α −2 t log(n/δ t )), and that of TKS2 is at most O(k t α −2 t log(n/δ t )). By Lemma 3, the expected number of comparisons conducted by each call of DI is at most O(α −2 t log(|R t |/δ t )). Thus, in round t, the expected number of comparisons is at most. Recall that for any item i, T i is the index of the round when item i is assured (i.e., item i is not added to R t+1 ) or the the algorithm terminates. Thus, we havewhere c 9 > 0 is a universal constant.. Since E t happens, by Lemme 3, at round t, item i is added to S down , i.e., item i is not added to R t+1 . Second, we consider the case where i ∈ U − {r k }. Since t ≥ τ i , we have α t < ∆ i,r k , i.e., p i,r k > 1/2+α t . By Eq. (7), we have ∆ vt,r k ≤ α t /3, which implies that p i,vt = 1/2 + ∆ i,vt ≥ 1/2 + (∆ i,r k − ∆ vt,r k ) ≥ 1/2 − 2α t /3. Since E t happens, by Lemma 3, at round t, item i is added to S up , i.e., item i is not added to R t+1 . Thus, when ∩ T t=1 E t happens,≤ c 10 · 4 τi log τ i + log π 2 n 6δ ≤ c 11 · 4 1+log 2 ∆i,r k log(1 + log 2 ∆ −1 i,r k ) + log(n/δ) ≤ c 12 ∆ −2 i,r k (log(n/δ) + log log ∆ −1 i,r k ),where c 10 , c 11 , c 12 > 0 are three universal constants.Also, we observe that when all items in [n] − U are assured, SEEKS will terminate and conduct no more comparisons. At round t with t ≥ max i∈[n]−U τ i = τ r k+1 , since ∩ T t=1 E t happens, all items not in U are assured. Thus, we have T r1 , T r2 , ..., T r k ≤ τ r k+1 . Similar to Eq. (10), we have that for any item i in U ,Note that for any item i in U = {r 1 , r 2 , ..., r k }, ∆ i = ∆ i,r k+1 and ∆ i = ∆ i,r k + ∆ r k ,r k+1 , which implies that min{∆ −1 i,r k , ∆ −1 r k ,r k+1 } ≤ 2∆ −1 i . Therefore, by Eq. (9) and Eq. (11), for any item i in U , we haveThus, by Eq. (9) and Eq. (12), and the definition of ∆ i 's stated in Eq.(1), we conclude that when ∩ T t=1 E t happens, ∆ −2 i (log(n/δ) + log log ∆ −1 i ) .This completes the proof of the sample complexity, and the proof of Theorem 9 is complete.