key: cord-0226797-ixw5pjdy authors: Liang, Yuchen; Veeravalli, Venugopal V. title: Non-Parametric Quickest Detection of a Change in the Mean of an Observation Sequence date: 2021-01-14 journal: nan DOI: nan sha: 001a137f03695284bad6d9514bd5306dbca409ce doc_id: 226797 cord_uid: ixw5pjdy We study the problem of quickest detection of a change in the mean of an observation sequence, under the assumption that both the pre- and post-change distributions have bounded support. We first study the case where the pre-change distribution is known, and then study the extension where only the mean and variance of the pre-change distribution are known. In both cases, no knowledge of the post-change distribution is assumed other than that it has bounded support. For the case where the pre-change distribution is known, we derive a test that asymptotically minimizes the worst-case detection delay over all post-change distributions, as the false alarm rate goes to zero. We then study the limiting form of the optimal test as the gap between the pre- and post-change means goes to zero, which we call the Mean-Change Test (MCT). We show that the MCT can be designed with only knowledge of the mean and variance of the pre-change distribution. We validate our analysis through numerical results for detecting a change in the mean of a beta distribution. We also demonstrate the use of the MCT for pandemic monitoring. Quickest Change Detection (QCD) is a fundamental problem in mathematical statistics. Given a stochastic sequence whose distribution changes at some unknown change-point, the goal is to detect the change after it occurs as quickly as possible, subject to false alarm constraints. The QCD framework has had a wide range of applications, including but not limited to line-outage in power systems [1] , dim-target manoeuvre detection [2] , stochastic process control [3] , structural health monitoring [4] , and, more recently, in the Multi-Armed Bandit (MAB) problem with piece-wise stationary bandits [5] . Two formulations are used in the classical QCD problem: the Bayesian formulation [6] , where the changepoint is assumed to follow some prior distribution, and the minimax formulation [7] , [8] , where the worst-case detection delay is minimized over all possible change-points, subject to false alarm constraints. In both the Bayesian and minimax settings, if the pre-and post-change distributions are known, This work was supported in part by the National Science Foundation under grant ECCS-2033900, and by the Army Research Laboratory under Cooperative Agreement W911NF-17-2-0196, through the University of Illinois at Urbana-Champaign. low-complexity efficient solutions to the QCD problem can be found [9] . In many practical situations, we may not know the exact distribution in the pre-or post-change regimes. Although it is reasonable to assume that we can obtain a large amount of data in the pre-change regime, this may not be the case for the post-change regime. In applications such as the epidemic detection problem and the MAB problem, a change in a specific statistic (e.g., the mean) of the distribution is of interest. This is different from the original QCD problem where any distributional change needs to be detected. Furthermore, in many applications, the support of the distribution is bounded. For example, the observations representing the fraction of some specific group in the entire population are bounded between 0 and 1. This is the case, for example, in the pandemic monitoring problem that we discuss in detail in Section IV. There have been a number of lines of work on the QCD problem when the pre-and/or post-change distributions are not completely known. The most prevalent is the Generalized Likelihood Ratio (GLR) approach, in which the maximum of the test statistics corresponding to each possible changepoints is used to construct the test. A GLR approach for a Gaussian mean-change detection problem is proposed in [10] . A GLR test for the case where the pre-and post-change distributions come from one-parameter exponential families is analyzed in [11] . A kernel approach to construct GLR statistics is discussed in [12] . More recently, a GLR meanchange test for sub-Gaussian observations using scan statistics is proposed and analyzed in [13] , and this approach is applied to the MAB problem in [5] . In all of these methods, the complexity of computing the test statistic at each time-step grows at least linearly with the number of samples. In practice, a windowed version of the GLR test statistic is used to keep the complexity small, while suffering some loss in performance. An alternative to the GLR approach was given in [14] , where a histogram is used to estimate the post-change distribution, and a low complexity test with a recursive structure is constructed. Another line of work is the one based on a minimax robust approach [15] , in which it is assumed that the distributions come from mutually exclusive uncertainty classes. Under certain conditions on the uncertainty classes, e.g., joint stochastic boundedness [9] , a low-complexity saddle-point solution to the minimax robust QCD problem can be found. In this paper, we use an asymptotic version of the minimax robust QCD problem formulation to develop algorithms for the nonparametric detection of a change in mean of an observation sequence. Our contributions are as follows: 1) We study the problem of quickest detection of a change in the mean of an observation sequence under the assumption that no knowledge of the post-change distribution is available other than that it has bounded support. 2) For the case where the pre-change distribution is known, we derive a test that asymptotically minimizes the worstcase detection delay over all possible post-change distributions, as the false alarm rate goes to zero. 3) We study the limiting form of the optimal test as the gap between the pre-and post-change means goes to zero, which we call the Mean-Change Test (MCT). We show that the MCT can be designed with only knowledge of the mean and variance of the pre-change distribution. 4) We validate our analysis through numerical results for detecting a change in the mean of a beta distribution. We also demonstrate the use of the MCT for pandemic monitoring. Let X 1 , . . . , X t , · · · ∈ [0, 1] be independent samples, with X 1 , . . . , X ν ∼ P 0 , and X ν+1 , · · · ∼ P 1 . Let P P0,P1 ν {·} denote the probability measure on the entire sequence of observations when the pre-and post-change distributions are P 0 and P 1 , respectively, and the change-point is at t = ν, and let E P0,P1 ν [·] denote the corresponding expectation. We will first consider the case where P 0 is completely known, and then study extensions to the cases where only partial (moment) information about P 0 is available. The change-time ν is unknown but deterministic. The problem is to detect the change quickly while not causing too many false alarms. Let τ be a stopping time [9] defined on the observation sequence associated with the detection rule, i.e. τ is the time at which we stop taking observations and declare that the change has occurred. If both the pre-and post-change distributions are known, Lorden [7] proposed solving the following optimization problem to find the best stopping time τ : where is a worst-case delay metric and is the feasible set where FAR P0 (τ ) := E P0,P1 is the expectation operator when the change never happens, and (·) + := max{0, ·}. Lorden also showed that Page's Cumulative Sum (CUSUM) algorithm [16] whose test statistic is given by: is the likelihood ratio between the densities. The CUSUM stopping rule is given by: where b α := | ln α|. It was shown by Moustakides [17] that Page's test is exactly optimal for the problem in (2) . When the pre-change and post-change distributions are unknown but belong to some uncertainty sets, a minimax robust metric can be applied: where the feasible set becomes A pair of uncertainty sets (P 0 , P 1 ) is said to be jointly stochastically (JS) bounded by (P 0 ,P 1 ) ∈ P 0 × P 1 if, for any (P 0 , P 1 ) ∈ P 0 × P 1 and any t > 0, where LP 0 ,P1 is the likelihood ratio betweenP 1 andP 0 as defined previously [9] . The distributionsP 0 andP 1 ) are called least favorable distributions (LFDs) within the classes P 0 and P 1 , respectively. If the pair of pre-and post-change uncertainty sets is JS bounded, the test statistic ΛP 0,P1 (t) with the stopping rule τ (ΛP 0,P1 , b α ) solves (6) exactly [18] . A pair of uncertainty sets (P 0 , P 1 ) is said to be weakly stochastically (WS) bounded by (P 0 ,P 1 ) ∈ P 0 × P 1 if for all P 1 ∈ P 1 , and for all P 0 ∈ P 0 [2] , where E P [·] denotes the expectation operator with respect to distribution P . It is shown in [2] that if a pair of uncertainty sets is JS bounded by (P 0 ,P 1 ), it is also WS bounded by (P 0 ,P 1 ). and if furthermore, P 1 is convex, and (11) holds for all P 0 ∈ P 0 , then (P 0 , P 1 ) is WS bounded by (P 0 ,P 1 ), and τ (ΛP 0,P1 , b α ) solves the problem in (6) asymptotically as α → 0. Also, the worst-case delay is In this paper, we consider the problem of quickest change detection for observations with bounded support. The goal is to construct a test to detect when the mean of the observations exceeds some pre-specified threshold, i.e., P 1 ∈ P 1 such that In this expression, X denotes a generic observation in the sequence, η is a pre-designed threshold on the mean, and Also, for i = 0, 1, let the density (w.r.t. the Lebesgue measure) of P i be p i . Define to be the cumulant-generating functiong (cgf) under measure P 0 . We present our main results below. Throughout we will assume that P 1 is as defined in (14) . In this subsection, we study the case where P 0 = {P 0 }. Theorem III.1. For P 0 = P 0 , and P 1 given in (14) , define where κ 0 (λ) is the cgf under P 0 and λ * satisfies Then, the statistic and the stopping rule τ (Λ P0,P * 1 , b α ) with threshold b α = | ln α| solves the minimax robust problem in (6) asymptotically as α → 0, and Proof. We follow the procedure outlined in [19, Sec. 6.4.1] . We want to minimize D(P 1 ||P 0 ) = E P1 [ln(p 1 (x)/p 0 (x))] subject to E P1 [X] ≥ η. We consider the Lagrangian where the Lagrange multiplier λ ≥ 0 corresponds to the constraint that the post-change mean is greater than η, and µ corresponds to the constraint that p 1 (x) is a probability measure. For an arbitrary direction z, since p 1 is continuous by assumption, we take the Gateaux derivative with respect to p 1 : where µ = µ − 1, and since z is arbitrary, we arrive at By the Generalized Kuhn-Tucker Theorem [20] , if p 0 (x) is bounded, p 1 (x) = p 0 (x)e λx+µ is a necessary condition for optimality. Furthermore, since L(p 1 , λ, µ) is convex in p 1 , this is also a global optimum. To satisfy the constraints, and λ * satisfies Thus, P * 1 in (17) minimizes the KL divergence when P 0 is known. By Theorem II.1, since (P 0 , P * 1 ) minimizes the KL divergence, P 1 is convex in that the expectation operator is linear, and P 0 is a singleton, the uncertainty classes P 0 × P 1 are WS bounded by (P 0 , P * 1 ). Therefore, an asymptotically optimal test is Λ P0,P * 1 as in (19) . Furthermore, the minimum KL-divergence is Hence, the worst-case delay satisfies as α → 0. Since p 0 is a density on [0, 1], p * is also a density on [0, 1]. Indeed, p * is the exponentially-tilted distribution (or the Esscher transform) of p 0 . Even though we have an expression for the test statistic when P 0 is known, as given in (19) , the exact solution of λ * is not available in closed-form. Fortunately, if the meanchange gap ∆ is small, we obtain a low-complexity test in terms of only the pre-change mean and variance that closely approximates the performance of the asymptotically minimax optimal test in the previous section. As the gap is asymptotically small, the worst-case postchange mean η → µ 0 . Hence, λ * → 0. From the Taylor expansion on κ 0 around 0, we obtain In this same regime, by continuity of κ 0 (·), where we have used κ 0 (λ * ) = η. Hence, the approximate test statistic is and the corresponding minimum KL-divergence is approximated as: Let σ 2 0 be the variance of X under P 0 . Since the stopping rule τ (Λ P0,P * 1 , b α ) can be approximated by the stopping rule τ (Λ µ0,η ,b α ), wherẽ withΛ µ0,η (0) = 0. We call τ (Λ µ0,η ,b α ) the Mean-Change Test (MCT), andΛ µ0,η the MCT statistic. Noe that the worst-case delay satisfies where the approximation becomes more accurate as ∆ → 0. Therefore, if the gap is small, it is sufficient to know only the mean and variance to construct a good approximation to the asymptotically minimax robust test. Furthermore, only the mean of the pre-change distribution is needed to construct the MCT statistic. From the simulation results in Section IV, the performance of the MCT is very close to that of the asymptotically minimax robust test. Since the mean and variance of a distribution are much easier and more accurately estimated than the entire density, this test can be useful and accurate when only a moderate number of observations in the pre-change regime is available. We now study the asymptotic performance of the MCT for the case where ∆ is moderate, as α → 0. and K β (z) is the modified Bessel function of the second kind with order β. where a := (σ 2 0 + M ∆) −1 and C := σ 2 0 b/(σ 2 0 + M ∆). In the series of inequalities above, (a) is by Bernstein's inequality [21, p. 9] , and (b) is from bounding the sum with an integral. Since K 1 (z) ∼ π 2z e −z as |z| → ∞, the asymptotic result follows. then (7) is satisfied asymptotically as α → 0. Furthermore, where R 0 is defined in (37). Proof. As α → 0,b α → ∞. As a result, for any P 1 ∈ P 1 , (1)). From [22, Sec. 2.6] , it can be shown that Thus, the false alarm constraint is satisfied asymptotically. For the second result, it is sufficient to show that Note that the desired ratio (b α −b α )/b α = D/| ln α|. Plugging into (36), we have Now, we hypothesize that D = D 1 | ln α| + o(| ln α|). The first term then vanishes because ln σ 4 0 π ∆ 4 (D + | ln α|) = ln From the second term, we validate our hypothesis that and the second result follows. Remark. In practice, the threshold can be set to be σ 2 0 /(2R 2 0 ∆) using equation (40). Theorem III.4. Fix P 0 ∈ P 0 . The worst-case delay satisfies Proof. For any P 1 ∈ P 1 , as α → 0, where the first line is by renewal theory [23] . Remark. As ∆ → 0, R 0 → 1. Thus, the result above becomes (1)) (49) where o(1) goes zero as α and ∆ go to zero, which coincides with the minimax robust worst-case delay in (35). We study the performance of the proposed tests through simulations for the case where the pre-and post-change distributions are beta(4,16) (µ 0 = 0.2) and beta(4.5,16) (µ 1 = 0.2195), respectively. The mean-threshold η is set to be 0.21. In particular, we compare the performances for the following three statistics: 1) The CUSUM statistic that knows both the pre-and postchange distributions, defined in (4). 2) The statistic when only the pre-change distribution is known, defined in (19) . 3) The MCT statistic defined in (34). For all three statistics, based on their recursive structure, it is easy to show that the worst-case value of the change-point for computing WADD in (2) is ν = 0. Therefore we can estimate the worst-case delay by simulating the post-change distribution from time 0. We see in Fig. 1 that the performance of MCT is very close to the test that uses knowledge of the entire pre-change distribution. Note that the MCT statistic only uses the prechange mean; the variance is needed for setting the threshold to meet a given FAR constraint. In Fig. 2 , we apply the MCT to monitoring the spread of COVID-19 using new case data from various counties in the US [24] . The incremental cases from day to day can be assumed to be roughly independent. . The x-axis is the number days elapsed after January 21, 2020. The pre-change mean and variance are estimated using data from days 120 to 150. The FAR threshold α is set to 0.01. For each county, the mean-threshold η (in green) is set to be 3.3 times of the estimated pre-change mean (in cyan). The lower subplot shows the evolving of the statisticΛ in the corresponding county. The Λ-thresholdbα (in red) is calculated using equation (33). The goal is to detect the onset of a new wave of the pandemic based on the incremental cases as a fraction of the county population exceeding some pre-specified level. The pre-change mean and variance are estimated using observations for periods in which the increments remain low and roughly constant. We set the mean-threshold η to be a multiple of the prechange mean, with understanding that such a threshold might be indicative of a new wave. With this choice, we observe that the MCT statistic significantly and persistently crosses the testthreshold around late November in all counties, which is strong indication of a new wave of the pandemic. More importantly, unlike the raw observations which are highly varying, the MCT statistic shows a clear dichotomy between the pre-and postchange settings, with the statistic staying near zero before the purported onset of the new wave, and taking off nearly vertically after the onset. Quickest line outage detection and identification Misspecified and asymptotically minimax robust quickest change detection A survey of recent developments in the design of adaptive control charts Structural health monitoring using statistical process control The generalized likelihood ratio test meets KL-UCB: An improved algorithm for piece-wise non-stationary bandits General asymptotic bayesian theory of quickest change detection Procedures for reacting to a change in distribution Average run lengths of an optimal method of detecting a change in distribution Statistical Inference for Engineers and Data Scientists Using the generalized likelihood ratio statistic for sequential detection of a change-point Sequential change-point detection when the preand post-change parameters are unknown M-statistic for kernel changepoint detection Sequential change-point detection: Laplace concentration of scan statistics and non-asymptotic delay bounds A binning approach to quickest change detection with unknown post-change distribution A robust version of the probability ratio test Continuous Inspection Schemes Optimal stopping times for detecting changes in distributions Minimax robust quickest change detection Principles of Signal Detection and Parameter Estimation Optimization by Vector Space Methods All of Nonparametric Statistics. 233 Spring Street Sequential analysis: Tests and confidence intervals Academic press library in signal processing: Array and statistical signal processing Coronavirus (COVID-19) Data in the United States -The New York Times Gqp0XiRZGO24wcouQ5xRXh389-5cY88mGOynh9goKIh8x9 ogUY$