key: cord-0648981-1u7z4oxv
authors: Liang, Yuchen; Veeravalli, Venugopal V.
title: Non-Parametric Quickest Mean Change Detection
date: 2021-08-25
journal: nan
DOI: nan
sha: 7e69df1140cebb03c886e9369da2ee6563b62dd3
doc_id: 648981
cord_uid: 1u7z4oxv

The problem of quickest detection of a change in the mean of a sequence of independent observations is studied. The pre-change distribution is assumed to be stationary, while the post-change distributions are allowed to be non-stationary. The case where the pre-change distribution is known is studied first, and then the extension where only the mean and variance of the pre-change distribution are known. No knowledge of the post-change distributions is assumed other than that their means are above some pre-specified threshold larger than the pre-change mean. For the case where the pre-change distribution is known, a test is derived that asymptotically minimizes the worst-case detection delay over all possible post-change distributions, as the false alarm rate goes to zero. Towards deriving this asymptotically optimal test, some new results are provided for the general problem of asymptotic minimax robust quickest change detection in non-stationary settings. Then, the limiting form of the optimal test is studied as the gap between the pre- and post-change means goes to zero, called the Mean-Change Test (MCT). It is shown that the MCT can be designed with only knowledge of the mean and variance of the pre-change distribution. The performance of the MCT is also characterized when the mean gap is moderate, under the additional assumption that the distributions of the observations have bounded support. The analysis is validated through numerical results for detecting a change in the mean of a beta distribution. The use of the MCT in monitoring pandemics is also demonstrated.

Quickest change detection (QCD) is a fundamental problem in mathematical statistics (see, e.g., [2] for an overview). Given a stochastic sequence whose distribution changes at some unknown change-point, the goal is to detect the change after it occurs as quickly as possible, subject to false alarm constraints. The QCD framework has seen a wide range of applications, including line-outage in power systems [3] , dim-target manoeuvre detection [4] , stochastic process control [5] , structural health monitoring [6] , and piece-wise stationary multi-armed bandits [7] . The two main formulations of the classical QCD problem are the Bayesian formulation [8] , [9] , where the change-point is assumed to follow a known prior distribution, and the minimax formulation [10] , [11] , where the worst-case detection delay is minimized over all possible change-points, subject to false alarm constraints. In both the Bayesian and minimax settings, if the pre-and post-change distributions are known, low-complexity efficient solutions to the QCD problem can be found [2] .

In many practical situations, we may not know the exact distribution in the pre-or postchange regimes. While it is reasonable to assume that we can obtain a large amount of data in the pre-change regime, this may not be the case for the post-change regime. Also, in applications such epidemic monitoring and piece-wise stationary multi-armed bandits, a change in a specific statistic (e.g., the mean) of the distribution is of interest. This is different from the original QCD problem where any distributional change needs to be detected. Furthermore, in many applications, the support of the distribution is bounded. For example, the observations representing the fraction of some specific group in the entire population are bounded between 0 and 1. This is the case, for example, in the pandemic monitoring problem that we discuss in detail in Section IV. In many applications, including the pandemic monitoring problem, the system has usually reached some nominal steady-state distribution before the change-point. In these situations, the pre-change distribution can be assumed to be stationary.

In this paper, we study the problem of quickest detection of a change in the mean of a sequence of independent observations. The pre-change distribution is assumed to be stationary, while the post-change distributions are allowed to be non-stationary. We first study the case where the prechange distribution is known, and then study the extension where only the mean and variance of the pre-change distribution are known. No knowledge of the post-change distributions is assumed other than that their means are above some threshold larger than the pre-change mean.

There have been a number of lines of work on the QCD problem when the pre-and/or post-change distributions are not completely known. The most prevalent is the generalized likelihood ratio (GLR) approach, introduced in [10] for the parametric case where the postchange distribution has an unknown parameter. This GLR approach is studied in detail for the problem of detecting the change in the mean of a Gaussian distribution with unknown postchange mean in [12] . A GLR test for the case where the pre-and post-change distributions come from an one-parameter exponential family, and both the pre-and post-change parameters are unknown, is analyzed in [13] .

The QCD problem has also been studied in a non-parametric setting. In particular, for detecting a change in the mean of an observation sequence, one approach has been to use maximum scan statistics. The scan statistic of an observation sequence is defined as the absolute difference of the averages before and after a potential change-point. In [14] , the case where the preand post-change distributions have finite moment generating functions in some neighborhood around zero is considered. At each time greater than a window size N , the scan statistic at each potential change-point is calculated using the last N observations. The maximum scan statistic is then calculated over the set of potential change-points, and an alarm is raised if this maximum exceeds some threshold. In [15] , the case of sub-Gaussian pre-and post-change distributions is studied. The scan statistic is calculated over the entire observation sequence, and the maximum is compared to a threshold determined by the current time and the desired false alarm rate. This approach is further applied to the piece-wise stationary multi-armed bandit problem in [7] . We compare our approach to mean-change detection with a test using scan statistics in Section IV.

We note that for both the GLR the scan statistics approaches, the complexity of computing the test statistic at each time-step grows at least linearly with the number of samples. In practice, a windowed version of the test statistic is often used to reduce computational complexity, while suffering some loss in performance.

Still another line of work is the one based on a minimax robust approach [16] , in which it is assumed that the distributions come from mutually exclusive uncertainty classes. Under certain conditions on the uncertainty classes, e.g., joint stochastic boundedness [17] , low-complexity solutions to the minimax robust QCD problem can be found [18] . Under more general conditions, e.g., weak stochastic boundedness, a solution that is asymptotically close to the minimax solution can be found [4] .

In this paper, we use an asymptotic version of the minimax robust QCD problem formulation August 26, 2021 DRAFT [4] to develop algorithms for the non-parametric detection of a change in mean of an observation sequence. Our contributions are as follows:

1) We extend the asymptotic minimax robust QCD problem introduced in [4] to the more general non-stationary setting.

2) We study the problem of quickest detection of a change in the mean of an observation sequence under the assumption that no knowledge of the post-change distribution is available other than that its mean is above some threshold larger than the pre-change mean.

3) For the case where the pre-change distribution is known, we derive a test that asymptotically minimizes the worst-case detection delay over all possible post-change distributions, as the false alarm rate goes to zero.

We study the limiting form of the optimal test as the gap between the pre-and postchange means goes to zero, which we call the Mean-Change Test (MCT). We show that the MCT can be designed with only knowledge of the mean and variance of the pre-change distribution.

We also characterize the performance of the MCT when the mean gap is moderate, under the assumption that the distributions of the observations have bounded support.

We validate our analysis through numerical results for detecting a change in the mean of a beta distribution. We also demonstrate the use of the MCT for pandemic monitoring.

The rest of the paper is structured as follows. In Section II, we describe the quickest change detection problem under distributional uncertainty and provide some new results regarding asymptotically robust tests in the non-stationary setting. In Section III, we formulate the mean change detection problem, and propose and analyze the mean-change test (MCT), which solves the problem asymptotically. In Section IV, we validate our analysis through numerical results for detecting a change in the mean of a beta distribution, and also demonstrate the use of the MCT in monitoring pandemics. Finally, in Section V, we provide some concluding remarks.

Let X 1 , . . . , X t , · · · ∈ R be a sequence of independent random variables, and let ν be a changepoint. Let P 0 = {P 0,t } t≥1 and P 1 = {P 1,t } t≥1 be two sequences of probability measures, where P 0,t ∈ P 0 and P 1,t ∈ P 1 for all t ≥ 1. Further, assume that P j,t has probability density p j,t with respect to the Lebesgue measure on R, for j = 0, 1 and t ≥ 1. Let P P 0 ,P 1 ν {·} denote the probability measure on the entire sequence of observations when the pre-change distributions August 26, 2021 DRAFT are {P 0,t } t<ν and the post-change distributions are {P 1,t } t≥ν , with X t ∼ P 0,t , ∀1 ≤ t < ν and X t ∼ P 1,t , ∀t ≥ ν, and let E P 0 ,P 1 ν [·] denote the corresponding expectation. When P 0 and P 1 are stationary, i.e., P 0,t = P 0 , ∀t ≥ 1 and P 1,t = P 1 , ∀t ≥ 1, we use the notations P P 0 ,P 1 ν {·} and

The change-time ν is assumed to be unknown but deterministic. The problem is to detect the change quickly while not causing too many false alarms. Let τ be a stopping time [17] defined on the observation sequence associated with the detection rule, i.e. τ is the time at which we stop taking observations and declare that the change has occurred.

For the case where both the pre-and post-change distributions are stationary and known, Lorden [10] proposed solving the following optimization problem to find the best stopping time

where

is a worst-case delay metric, and

with FAR P 0 (τ ) := 1

Here

is the expectation operator when the change never happens, and (·) + := max{0, ·}.

Lorden also showed that Page's Cumulative Sum (CuSum) algorithm [19] whose test statistic is given by:

solves the problem in (1) asymptotically. Here L P 0 ,P 1 is the likelihood ratio:

The CuSum stopping rule is given by:

where b α := | ln α|. It was shown by Moustakides [20] that the CuSum algorithm is exactly optimal for the problem in (1) .

When the pre-change and post-change distributions are unknown but belong to known uncertainty sets and are possibly non-stationary, a minimax robust formulation can be used in place of (1):

where

and the feasible set is defined as

with FAR P 0 (τ ) := 1

We now address the solution to the problem in (8) . To this end, we give the following using definitions.

Definition II.1. (see, e.g., [17] ) A pair of uncertainty sets (P 0 , P 1 ) is said to be jointly stochastically (JS) bounded by (P 0 ,P 1 ) ∈ P 0 × P 1 if, for any (P 0 , P 1 ) ∈ P 0 × P 1 and any h > 0,

where LP 0 ,P 1 is the likelihood ratio betweenP 1 andP 0 (see (6) ). The distributionsP 0 andP 1 are called least favorable distributions (LFDs) within the classes P 0 and P 1 , respectively.

If the pair of pre-and post-change uncertainty sets is JS bounded, the CuSum test statistic ΛP 0 ,P 1 (t) (see (5)), with stopping rule τ (ΛP 0 ,P 1 , b α ) (see (7)), solves (8) exactly both when P 0 and P 1 are stationary [18] and when they are potentially non-stationary [21] .

Definition II.2. (see [4] ) A pair of uncertainty sets (P 0 , P 1 ) is said to be weakly stochastically

for all P 1 ∈ P 1 , and

for all P 0 ∈ P 0 . Here, E P [·] denotes the expectation operator with respect to distribution P , and D(P ||Q) denotes KL-divergence:

It is shown in [4] that if the pair of uncertainty sets is JS bounded by (P 0 ,P 1 ), it is also WS bounded by (P 0 ,P 1 ). It is also shown in [4] that if the pair of pre-and post-change uncertainty sets is WS bounded, the CuSum test statistic ΛP 0 ,P 1 (t) with stopping rule τ (ΛP 0 ,P 1 , b α ) solves (8) asymptotically as α → 0 when P 0 and P 1 are both stationary.

LetP 0 ,P 1 be such that P 0 × P 1 is WS bounded by (P 0 ,P 1 ). In the following, we extend the result in [4] to the case where P 0 and P 1 are potentially non-stationary and derive an asymptotically optimal solution as α → 0. Specifically, through Lemma II.1 we upper bound the asymptotic delay, through Lemma II.2 we control the false alarm rate, and in Theorem II.3 we combine the lemmas to provide an asymptotically optimal solution to the problem in (8) when P 0 and P 1 are potentially non-stationary.

Lemma II.1. Consider P 0 × P 1 WS bounded by (P 0 ,P 1 ). Let P 0 and P 1 be such that P 0,t ∈ P 0 and P 1,t ∈ P 1 for all t ≥ 1. Suppose that for all P 1,t ∈ P 1 ,

where Var P (X) denotes the variance of X when X ∼ P . Then, τ (ΛP 0 ,P 1 , b) satisfies

Lemma II.2. Consider the same assumptions as in Lemma II.1. Then, for any P 0,t ∈ P 0 ,

for any threshold b > 0. solves the problem in (8) asymptotically as α → 0, and sup (P 0 ,P 1 ):(P 0,t ,P 1,t )∈P 0 ×P 1 ,∀t

The proofs of Lemma II.1, Lemma II.2 and Theorem II.3 are given in the appendix.

Until now, we have considered the general QCD problem formulated in (8) . In this paper, we are mainly interested in a special case of the problem, described as follows. The pre-change distribution is stationary, i.e., P 0,t = P 0 , ∀t ≥ 1, with pre-change mean µ 0 = E P 0 [X] and variance σ 2 0 = Var P 0 (X). Thus, P 0 = {P 0 } is a singleton. The post-change distribution could be non-stationary, and at each time it belongs to the following uncertainty set:

In this expression, X denotes a generic observation in the sequence, and η is a pre-designed threshold. Define

which is half of the worst-case mean-change gap.

The minimax robust mean-change problem, which is a reformulation of (8) is given by:

Our goal is to find a stopping time that solves (22) asymptotically as the false alarm rate α → 0.

Define

to be the cumulant-generating function (cgf) of the observations under P 0 . In the following theorem, we provide a solution to the problem stated in (22) .

Theorem III.1. Consider P 0 = {P 0 }, and M 1 as given in (20) . Definẽ

where κ 0 (λ) is the cgf under P 0 and λ * satisfies

Then, the CuSum statistic

and the stopping rule τ (Λ P 0 ,P 1 , b α ) (see (7)) with threshold b α = | ln α| solves the minimax robust problem in (22) asymptotically as α → 0, and

Proof. The proof follows from an application of Theorem II.3 if we can establish that 

where the Lagrange multiplier λ ≥ 0 corresponds to the constraint that the post-change mean is greater than η, and µ corresponds to the constraint that p 1 (x) is a probability measure. For an arbitrary direction z, we take the Gateaux derivative with respect to p 1 :

where µ = µ − 1, and since z is arbitrary, we arrive at

By the Generalized Kuhn-Tucker Theorem [23] , since p 0 (x) is bounded, p 1 (x) = p 0 (x)e λx+µ is a necessary condition for optimality. Furthermore, since J(p 1 , λ, µ) is convex in p 1 , this is also a global optimum. To satisfy the constraints, we have

and that λ * satisfies

Furthermore, the minimum KL-divergence is

Hence, the worst-case delay satisfies

as α → 0.

Note thatp 1 is an exponentially-tilted version (or the Esscher transform) of p 0 .

Even though we have an expression for the test statistic when P 0 is known, as given in (26), the exact solution of λ * is not available in closed-form. Fortunately, if the mean-change gap ∆ is small, we obtain a low-complexity test in terms of only the pre-change mean and variance that closely approximates the performance of the asymptotically minimax optimal test in the previous section.

As ∆ → 0, η → µ 0 , and hence λ * → 0. From a second-order Taylor expansion on κ 0 around 0, we obtain

In this same regime, by continuity of κ 0 (·),

where we have used κ 0 (λ * ) = η. Hence, the approximate log-likelihood ratio at time t is

and the corresponding minimum KL-divergence is approximated as:

Therefore, the stopping rule τ (Λ P 0 ,P 1 , b α ) can be approximated by the stopping rule τ (Λ µ 0 ,η ,b α ),

withΛ µ 0 ,η (0) = 0. We call τ (Λ µ 0 ,η ,b α ) the Mean-Change Test (MCT), andΛ µ 0 ,η the MCT statistic.

From (38), it follows that as α → 0 and ∆ → 0, the worst-case delay satisfies (5)) with known stationary pre-and post-change distributions, P 0 ∼ N (µ 0 , σ 2 ) and P 1 ∼ N (η, σ 2 ), respectively. Here N (µ, σ 2 ) denotes a Gaussian distribution with mean µ and variance σ 2 .

C. Performance Analysis of MCT for moderate ∆

We now study the asymptotic performance of the MCT for fixed ∆, as α → 0. For this part of the analysis, we assume that the pre-and post-change distributions have supports that are uniformly bounded, and without loss of generality, we assume that the bounding interval is

This assumption holds in many practical applications, including the pandemic monitoring problem discussed in Section IV.

Then the MCT statistic of (41) can be written as:

withΛ µ 0 ,η (0) = 0. The MCT stopping time is given by:

where b has to be chosen to meet the FAR constraint:

In what follows, we write τ (Λ µ 0 ,η , b) as τ (b), with the understanding that the test statistic being used throughout is the MCT statisticΛ µ 0 ,η .

In Lemma III.2 below, we first control the boundary crossing probability of S t in the pre-change regime. Then, in Theorem III.3, we use Lemma III.2 to bound the false alarm rate of the MCT asymptotically using the procedure outlined in [24] .

Lemma III.2. Assume that the pre-change distribution P 0 has known pre-change mean µ 0 and variance σ 2 0 , and that the post-change distribution is non-stationary with P 1,t ∈ M 1 , for all t ≥ 1. For b > 0, define the supplementary stopping time (43). Then,

where

and K β (z) is the modified Bessel function of the second kind of order β.

Proof. Note that

Let M = max{µ 0 /3, (1 − µ 0 )/3}; then |Z i + ∆| ≤ 3M . Thus, we have

where a := (σ 2 0 + M ∆) −1 and C := σ 2 0 b/(σ 2 0 + M ∆). In the series of inequalities above, (i) follows from Bernstein's inequality [25, p. 9 ], (ii) follows from bounding the sum with an integral, and (iii) follows from Lemma A.2 in the appendix, with u = a 2 ∆ 2 /2 and v = C 2 /2.

Since K 1 (z) = π 2z e −z (1 + o(1)) as |z| → ∞, the asymptotic result follows.

Theorem III.3. Under the same assumptions as in Lemma III.2, letb α be such that

Then, the MCT withb α , i.e., τ (b α ), meets the FAR constraint (46) asymptotically as α → 0.

whereb α is defined in (40) and R 0 is defined in (49). (47). From Lemma III.2, for any (1)). Then, using [24, Sec. 2.6], it can be shown that

where ( * ) follows because E P 0 τ (b α ) ≥ 1. Thus, (46) is satisfied asymptotically.

For the second result, it is sufficient to show that

Then, recalling the definition ofb α in (40), we havẽ

and we need to show that

Rearranging the terms in (54), we can expressb α as:

Plugging this expression forb α into (48), we have

Taking log on both sides, we obtain − 1 2 ln σ 4 0 π ∆ 4 (D + | ln α|) + R 2 0 (D + | ln α|) = | ln α|.

In the following, we first hypothesize that D = D 1 | ln α|+o(| ln α|), where D 1 is not a function of α, and then validate the hypothesis. Using this expression of D, the first term becomes ln σ 4 0 π ∆ 4 (D + | ln α|) = ln

Therefore, (58) can be restated as:

This validates our hypothesis on D, and also establishes (55). The proof is now complete.

Remark. The thresholdb α that meets the FAR constraint (46) asymptotically can be obtained by solving (50) numerically. Alternatively, we can use the approximation in (51) along with (40)

2) Worst-case Delay Analysis: We now turn to the delay analysis of MCT. The following two lemmas are useful in establishing the delay performance. Specifically, Lemma III.4 is used to guarantee that MCT statistic is finite in expectation, Lemma III.5 is used to extend Wald's identity to the non-stationary setting, and finally Theorem III.6 is used to upper bound the asymptotic delay of MCT in the case where P 1,t 's are non-stationary.

Lemma III.4. Suppose that P 1,t ∈ M 1 for all t ≥ 1. Then, for any b > 0, E P 0 ,P 1

Lemma III.5. Let Z 1 , Z 2 , . . . be independent random variables. For any t ≥ 1, Z t ∼ P t and E Pt [Z t ] ≥ ∆. Let T be any stopping time w.r.t. Z 1 , Z 2 , . . . such that E P [T ] < ∞. Then,

The proofs of the lemmas are given in the appendix. Using these lemmas, we can upper bound the asymptotic delay as follows.

Theorem III.6. Under the same assumptions as in Lemma III.2, the worst-case delay satisfies sup P 1 :P 1,t ∈M 1 ,∀t

as α → 0, whereb α is defined in (50).

Proof. Following Lemma III.4, the MCT stopping time is finite in expectation even when the post-change distributions are non-stationary (but lie in M 1 ). Thus, for any P 1,t ∈ M 1 ,

where (i) follows by Lemma III.5, and (ii) follows because Z τ (b α ) ≤ 1. Thus, sup P 1 :P 1,t ∈M 1 ,∀t≥1

For the other direction, consider stationary P 1,t = P * 1 ∈ M 1 with the post-change mean

Then, as α → 0,

where the first line follows by a standard renewal theory argument [26, Sec. 2.5] .

Remark. As ∆ → 0, R 0 → 1. Thus, the result in Theorem III.6 becomes sup P 1 :P 1,t ∈M 1 ,∀t≥1

where o(1) goes zero as α and ∆ go to zero, which coincides with the minimax robust worst-case delay in (42).

We study the performance of the proposed tests through simulations for the case where the pre-and post-change distributions are Beta(4,16) (µ 0 = 0.2) and Beta(4.5,16) (µ 1 = 0.2195),

respectively. The mean-threshold η is set to be 0.21. In particular, we compare the performances for the following three test statistics: 2) The statistic when only the pre-change distribution is known, defined in (26) .

3) The MCT statistic defined in (41).

For all three statistics, based on their recursive structure, it is easy to show that the worst-case value of the change-point for computing WADD in (1) is ν = 1. Therefore we can estimate the worst-case delays of the tests by simulating the post-change distribution from time 1.

We see in Fig. 1 that the performance of MCT is very close to that of the asymptotically minimax robust optimal test that uses the full knowledge of the pre-change distribution. Note that the MCT statistic uses only the pre-change mean; the variance is required for setting the threshold to meet a given FAR constraint.

In Fig. 2 , we compare the performance of the MCT when the post-change distribution is non-stationary with that when the post-change distribution is stationary, for beta distributed observations. In the stationary case, we choose the post-change distribution to have mean µ 1 = η, and in the non-stationary we choose the post-change distributions such that they all have mean greater than or equal to η. We observe, as expected, that the worst-case delay in the non-stationary case is always smaller than that in the stationary case. Now, we compare our MCT test with a test using scan statistics (without windowing), defined as (see, e.g., [15] ):

where, assuming s ≤ t,μ

The scan statistic test (SST) τ scan is designed to detect a change in the mean of the observation sequence, but does not incorporate the knowledge that the post-change mean is greater than or equal to η. The SST also does not require knowledge of the pre-change mean, but it requires the change-point to be large enough so that a reasonable estimate of the pre-change mean can be obtained fromμ 1:s−1 .

In the results shown in Fig. 3 , we assume that the change-point occurs after the first 100 observations are collected. To allow for a fair comparison between MCT and SST, we use the first 100 observations to estimate µ 0 for use in the MCT statistic, instead of assuming that µ 0 is known. For the MCT simulation, the statistic is initialized after the estimation of µ 0 from the first 100 samples, and therefore the delay is simulated by assuming that the change happens immediately after initialization, which corresponds to ν = 1, the worst-case value of the changepoint. For the SST simulation, the change-point is set ν = 101, which may not necessarily result in the worst-case delay. In Fig. 3 , we see that the worst-case delay for MCT is much smaller than the delay of τ scan at ν = 101, which is a lower bound of the worst-case delay of τ scan over all possible change-points.

In Fig. 4 , we apply the MCT to monitoring the spread of COVID-19 using new case data from various counties in the US [27]. The incremental cases from day to day can be assumed to be roughly independent. The goal is to detect the onset of a new wave of the pandemic based on the incremental cases as a fraction of the county population exceeding some pre-specified level. The pre-change mean and variance are estimated using observations for periods in which the increments remain low and roughly constant. We set the mean-threshold η to be a multiple of the pre-change mean, with understanding that such a threshold might be indicative of a new wave. With this choice, we observe that the MCT statistic significantly and persistently crosses the test-threshold around late November in all counties, which is strong indication of a new wave of the pandemic. More importantly, unlike the raw observations which are highly varying, the MCT statistic shows a clear dichotomy between the pre-and post-change settings, with the statistic staying near zero before the purported onset of the new wave, and taking off nearly vertically after the onset.

We studied the problem of quickest detection of a change in the mean of an observation sequence to a value above a pre-specified threshold in a non-parametric setting, allowing for the post-change distribution to be non-stationary. For the case where the pre-change distribution is known, we derived a test that asymptotically minimizes the worst-case detection delay over all post-change distributions, as the false alarm rate goes to zero. In the process of deriving this asymptotically optimal test, we provided some new results for the general problem of asymptotic minimax robust quickest change detection in non-stationary settings, which should be of independent interest. We then studied the limiting form of the optimal test as the gap between the pre-and post-change means goes to zero, the MCT. The MCT statistic only requires knowledge of the pre-change mean. Under the additional assumption that the distributions of the observations have bounded support, we derived an asymptotic upper bound on the FAR of the MCT for moderate values of mean gap, which can be used to set the threshold of the MCT using only knowledge of the pre-change mean and variance. We also characterized the asymptotic worst-case delay of the MCT for moderate values of the mean gap.

We validated our analysis through numerical results for detecting a change in the mean of a beta distribution. In particular, we found that the MCT suffers little performance loss relative to the asymptotically optimal test with known pre-change distribution. We also showed that the MCT can significantly outperform tests based on prior work on scan statistics, which do not use information about the post-change mean threshold η. We also demonstrated the use of the MCT for detecting the onset of a new wave of an existing pandemic.

A possible avenue for future research on this topic is the detection of a change in statistics other than the mean. It is also of interest to study the mean change detection problem in sensor network settings.

[ 

The following lemma is useful for the proof of Lemma II.1.

Lemma A.1. Let Y 1 , . . . , Y n be independent, zero-mean random variables. Suppose

as n → ∞. Then, as n → ∞,

Proof of Lemma A.1. Denote U n = n i=1 Y i . By Chebyshev's inequality, for any > 0,

where ( * ) is due to the fact that Y i 's are independent with zero-mean.

Proof of Lemma II.1. Fix 0 < δ < 1. Denote τ b (P 0 ,P 1 ) as a short-hand notation for τ (ΛP 0 ,P 1 , b).

For any t ≥ ν, let IP 0 ,P 1 t := E P 1,t LP 0 ,P 1 (X) = D(P 1,t ||P 0 ) − D(P 1,t ||P 1 ).

By definition of WS boundedness,

Let Z t 2 t 1 (P 0 , P 1 ) := t 2 t=t 1 ln L P 0,t ,P 1,t (X t ). Let n c := b/(1 − δ)D(P 1 ||P 0 ) . From the proof of Theorem 4 in [28] (and also Theorem 1 in [4] ), if we can establish

for t > ν, then, with a large enough b, we can get a large enough n c to satisfy

or equivalently,

By independence (despite post-change being non-stationary), we get ess sup P P 0 ,P 1

for any ν ≥ 1 and t ≥ 1. Therefore,

and from the definition of n c ,

Because δ is arbitrary, we can take δ → 0 and the proof is complete.

It remains to show (71). For any t > ν and δ > 0,

Note that ( * ) follows from the WS boundedness assumption, and δD(P 1 ||P 0 ) is some strictly positive constant. Next, we will use the previous lemma.

Thus, by Lemma A.1,

and thus (71) is proved. Now the proof is complete.

Proof of Lemma II.2. Recall that if no change ever happens, X t ∼ P 0,t ∈ P 0 for all t ≥ 1 and

Here P 0,t could be non-stationary. We follow the procedure in [28, Thm. 4].

For simplicity, denote Y t ≡ ln LP 0 ,P 1 (X t ).

Define the stopping times:

and let σ 0 := 0 and inf ∅ := ∞. Suppose for now that we can establish that, on {σ m < ∞},

for any threshold b > 0. Define the number of zero-crossings before hitting the threshold as

Thus, for any m > 0,

where the first inequality follows from (80) and the second one follows from recursion. Therefore,

It remains to show (80). By WS boundedness condition,

Therefore, {exp ( n i=k Y i ) , F n , n ≥ k} is a non-negative supermartingale under P 0 , and

where ( * ) follows from Lemma 1 in [4] . The proof is now complete.

Proof of Theorem II.3. The proof steps are similar to [4] . From max-min inequality, it is true that inf T ∈C P 0 α sup (P 0 ,P 1 ):(P 0,t ,P 1,t )∈P 0 ×P 1 ,∀t

It suffices to prove the other direction.

For any (P 0 , P 1 ) such that (P 0,t , P 1,t ) ∈ P 0 × P 1 for any t ≥ 1, we have

where o(1) → 0 as α → 0. In the above series of inequalities, (i) follows directly from Lemma II.1, (ii) and (iii) follow from standard CuSum analyses (e.g., [28]), (iv) is justified below, and

(v) follows from the fact that (P 0 ,P 1 ) ∈ P 0 × P 1 . Note that (iii) − (v) are satisfied for any 0 < α < 1.

We now justify (iv). SinceP 0 ∈ P 0 , C P 0 α ⊆ CP 0 α . Following standard CuSum analysis (e.g., [28]), τ (ΛP 0 ,P 1 , b α ) ∈ CP 0 α . From Lemma II.2, for any P 0,t ∈ P 0 , FAR P 0 τ (ΛP 0 ,P 1 , b α ) ≤ α, and therefore τ (ΛP 0 ,P 1 , b α ) ∈ C P 0 α . For any α, since τ (ΛP 0 ,P 1 , b α ) achieves the infimum over the set CP 0 α , it also does over the subset C P 0 α . Since (86) holds for any (P 0 , P 1 ) : (P 0,t , P 1,t ) ∈ P 0 × P 1 , ∀t ≥ 1, sup (P 0 ,P 1 ):(P 0,t ,P 1,t )∈P 0 ×P 1 ,∀t WADD P 0 ,P 1 τ (ΛP 0 ,P 1 , b α ) ≤ sup (P 0 ,P 1 ):(P 0,t ,P 1,t )∈P 0 ×P 1 ,∀t inf T ∈C Therefore, τ (ΛP 0 ,P 1 , b α ) asymptotically solves (8) as α → 0, and sup (P 0 ,P 1 ):(P 0,t ,P 1,t )∈P 0 ×P 1 ,∀t

where o(1) → 0 as α → 0.

The following Lemma is useful for the proof of Lemma III.2.

Lemma A.2. Let u, v be some constant. Then,

where K β (z) is the modified Bessel function of the second kind of order β.

Proof. Let y = e θ v/u. Then, the integral becomes 

where ( * ) follows because cosh(θ) is an even function while sinh(θ) is an odd function.

Proof of Lemma III.4. Recall that Z i := X i − (µ 0 + η)/2 and S t = t i=1 Z i . By assumption on M 1 , let Z t have mean ∆ t ≥ ∆ for any t under measure P 1,t . Fix b > 0. Define the supplementary stopping timeτ

Consider t > t 0 := b/∆ . Then,

where ( * ) follows from Hoeffding's inequality.

Using the same technique as the proof of lemma III.2,

where K 1 (z) is the modified Bessel function of the second kind with order 1. Hence,

Therefore, for any P 1,t ∈ M 1 , E P 0 ,P 1 1 [τ (b)] = E P 1 [τ (b)] < ∞. Finally, it follows directly that

Proof of Lemma III.5. For each t ≥ 1, let Z + t := max{0, Z t } and Z − t := − min{0, Z t }. Note that Z + t , Z − t ≥ 0 and Z t = Z + t − Z − t . Therefore,

by Monotone Convergence Theorem, since n t=1 Z + t 1{t ≤ T } is non-decreasing in n. The same argument applies to Z − t . Hence,

Non-parametric quickest detection of a change in the mean of an observation sequence

Academic press library in signal processing: Array and statistical signal processing

Quickest line outage detection and identification

Misspecified and asymptotically minimax robust quickest change detection

A survey of recent developments in the design of adaptive control charts

Structural health monitoring using statistical process control

The generalized likelihood ratio test meets KL-UCB: An improved algorithm for piece-wise non-stationary bandits

On optimum methods in quickest detection problems

General asymptotic bayesian theory of quickest change detection

Procedures for reacting to a change in distribution

Average run lengths of an optimal method of detecting a change in distribution

Using the generalized likelihood ratio statistic for sequential detection of a changepoint

Sequential change-point detection when the pre-and post-change parameters are unknown

A nonparametric method for fastest detection of a change in the mean of a random sequence

Sequential change-point detection: Laplace concentration of scan statistics and non-asymptotic delay bounds

A robust version of the probability ratio test

Statistical Inference for Engineers and Data Scientists

Minimax robust quickest change detection

Continuous Inspection Schemes

Optimal stopping times for detecting changes in distributions

Minimax robust quickest change detection in systems and signals with unknown transients

Principles of Signal Detection and Parameter Estimation

Optimization by Vector Space Methods, 1st ed. 605 Third Ave

Sequential analysis: Tests and confidence intervals

All of Nonparametric Statistics. 233 Spring Street

Sequential Analysis: Hypothesis Testing and Changepoint Detection