key: cord-0458282-0ymqvgmr authors: Liang, Yuchen; Tartakovsky, Alexander G.; Veeravalli, Venugopal V. title: Quickest Change Detection with Non-Stationary Post-Change Observations date: 2021-10-04 journal: nan DOI: nan sha: e1ac2e1b7312cf27ea9aed12ade570335242f4ad doc_id: 458282 cord_uid: 0ymqvgmr The problem of quickest detection of a change in the distribution of a sequence of independent observations is considered. The pre-change observations are assumed to be stationary with a known distribution, while the post-change observations are allowed to be non-stationary with some possible parametric uncertainty in their distribution. In particular, it is assumed that the cumulative KL divergence between the post-change and the pre-change distributions grows in a certain manner with time after the change-point. For the case where the post-change distributions are known, a universal asymptotic lower bound on the delay is derived, as the false alarm rate goes to zero. Furthermore, a window-limited Cumulative Sum (CuSum) procedure is developed, and shown to achieve the lower bound asymptotically. For the case where the post-change distributions have parametric uncertainty, a window-limited (WL) generalized likelihood-ratio (GLR) CuSum procedure is developed and is shown to achieve the universal lower bound asymptotically. Extensions to the case with dependent observations are discussed. The analysis is validated through numerical results on synthetic data. The use of the WL-GLR-CuSum procedure in monitoring pandemics is also demonstrated. The problem of quickest change detection (QCD) is of fundamental importance in a variety of applications and has been extensively studied in mathematical statistics (see, e.g., [1] - [4] Y. Liang and V.V. Veeravalli for overviews). Given a sequence of observations whose distribution changes at some unknown change point, the goal is to detect the change in distribution as quickly as possible after it occurs, while not making too many false alarms. In the classical formulations of the QCD problem, it is assumed that observations are independent and identically distributed (i.i.d.) with known pre-and post-change distributions. In many practical situations, while it is reasonable to assume that we can accurately estimate the pre-change distribution, the post-change distribution is rarely completely known. Furthermore, in many cases, it is reasonable to assume that the system is in a steady state before the change point and produces i.i.d. observations, but in the post-change mode the observations may be substantially non-identically distributed, i.e., non-stationary. For example, in the pandemic monitoring problem, the distribution of the number of people infected daily might have achieved a steady (stationary) state before the start of a new wave, but after the onset of a new wave, the post-change observations may no longer be stationary. Indeed, during the early phase of the new wave, the mean of the post-change distribution grows approximately exponentially. We will address the pandemic monitoring problem in detail in Section V. In this paper, our main focus is on the QCD problem with independent observations 1 , where the pre-change observations are assumed to be stationary with a known distribution, while the postchange observations are allowed to be non-stationary with some possible parametric uncertainty in their distribution. There have been extensions of the classical formulation to the case where the pre-and/or post-change distributions are not fully known and observations may be non-i.i.d., i.e., dependent and nonidentically distributed. For the i.i.d. case with parametric uncertainty in the post-change regime, Lorden [5] proposed a generalized likelihood ratio (GLR) Cumulative Sum (CuSum) procedure, and proved its asymptotic optimality in the minimax sense as the false alarm rate goes to zero, for one-parameter exponential families. An alternative to the GLR-CuSum, the mixture-based CuSum, was proposed and studied by Pollak [6] in the same setting as in [5] . The GLR approach has been studied in detail for the problem of detecting the change in the mean of a Gaussian i.i.d. sequence with an unknown post-change mean by Siegmund [7] . Both the mixture-based and GLR-CuSum procedures have been studied by Lai [8] in the pointwise setting in the non-i.i.d. case of possibly dependent and non-identically distributed observations, 1 The extension to the case of dependent observations is discussed in Section IV. Our contributions are as follows: 1) We develop a universal asymptotic (as the false alarm rate goes to zero) lower bound on the worst-case expected delay for our problem setting with non-stationary post-change observations. 2) We develop a WL-CuSum procedure that asymptotically achieves the lower bound on the worst-case expected delay when the post-change distribution is fully known. 3) We develop and analyze a WL-GLR-CuSum procedure that asymptotically achieves the worst-case expected delay when the post-change distribution has parametric uncertainty. 4 ) We validate our analysis through numerical results and demonstrate the use of our approach in monitoring pandemics. The rest of the paper is structured as follows. In Section II, we derive the information bounds and propose an asymptotically optimal WL-CuSum procedure when the post-change distribution completely known. In Section III, we propose an asymptotically optimal WL-GLR-CuSum procedure when the post-change distribution has unknown parameters. In Section IV, we discuss possible extensions to the general non-i.i.d. case where the observations can be dependent and non-stationary. In Section V, we present some numerical results, including results on monitoring pandemics. We conclude the paper in Section VI. In the Appendix, we provide proofs of certain results. Let {X n } n≥1 be a sequence of independent random variables (generally vectors), and let ν be a change point. Assume that X 1 , . . . , X ν−1 all have density p 0 with respect to some non-degenerate, sigma-finite measure µ and that X ν , X ν+1 , . . . have densities p 1,ν,ν , p 1,ν+1,ν , . . ., respectively, with respect to µ. Note that the observations are allowed to be non-stationary after the change point and the post-change distributions may generally depend on the change point. Let (F n ) n≥0 be the filtration, i.e., F 0 = {Ω, ∅} and F n = σ {X , 1 ≤ ≤ n} is the sigmaalgebra generated by the vector of n observations X 1 , . . . , X n and let F ∞ = σ(X 1 , X 2 , . . . ). In what follows we denote by P ν the probability measure on the entire sequence of observations when the change-point is ν. That is, under P ν the random variables X 1 , . . . , X ν−1 are i.i.d. with the common (pre-change) density p 0 and X ν , X ν+1 , . . . are independent with (post-change) densities p 1,ν,ν , p 1,ν+1,ν , . . . . Let E ν denote the corresponding expectation. For ν = ∞ this distribution will be denoted by P ∞ and the corresponding expectation by E ∞ . Evidently, under April 15, 2022 DRAFT P ∞ the random variables X 1 , X 2 , . . . are i.i.d. with density p 0 . In the sequel, we denote by τ Markov (stopping) times with respect to the filtration (F n ) n≥0 , i.e., the event {τ = n} belongs to F n . The change-time ν is assumed to be unknown but deterministic. The problem is to detect the change quickly while not causing too many false alarms. Let τ be a stopping time defined on the observation sequence associated with the detection rule, i.e., τ is the time at which we stop taking observations and declare that the change has occurred. The problem is to detect the change quickly, minimizing the delay to detection τ − ν, while not causing too many false alarms. A special case of the model described above is where both the pre-and post-change observations are i.i.d., i.e., p 1,n,ν ≡ p 1 for all n ≥ ν ≥ 1. In this case, Lorden [5] proposed solving the following optimization problem to find the best stopping time τ : where characterizes the worst-case expected delay, and ess sup stands for essential supremum. The constraint set is which guarantees that the false alarm rate of the algorithm does not exceed α. is the expectation operator when the change never happens, and we use the conventional notation (·) + := max{0, ·} for the nonnegative part. The mean time to a false alarm (MTFA) E ∞ [τ ] is sometimes referred to as the average run length to false alarm. Lorden also showed that Page's CuSum detection algorithm [15] whose detection statistic is given by: solves the problem in (1) asymptotically as α → 0. Here, Z n is the log-likelihood ratio defined as: The CuSum stopping rule is given by: In particular, if threshold is set as b = |log α|, then FAR (τ Page (b α )) ≤ α (see, e.g., [1, Lemma 8.2.1])). It was shown by Moustakides [16] that the CuSum algorithm is exactly optimal for the problem special way that accounts for the overshoot of W (n) over b α at stopping, which guarantees the approximation FAR (τ Page (b α )) ∼ α as α → 0, then we have the following third-order asymptotic approximation (as α → 0) for the worst-case expected detection delay of the optimal procedure: (see, e.g., [1] ), which also implies the first-order asymptotic approximation (as α → 0): (1)). Here D(p 1 ||p 0 ) is the Kullback-Leibler (KL) divergence between p 1 and p 0 . Also, in the following we use a standard notation o(x) as x → x 0 for the function f (x) such that f (x)/x → 0 as x → x 0 , i.e., o(1) → 0 as α → 0, and Along with Lorden's worst average detection delay WADD (τ ), defined in (2), we can also consider also the less pessimistic Pollak's performance measure [17] : Pollak suggested the following minimax optimization problem in class C α : An alternative to CuSum is the Shiryaev-Roberts (SR) change detection procedure τ SR based not on the maximization of the likelihood ratio over the unknown change point but on summation April 15, 2022 DRAFT of likelihood ratios (i.e., on averaging over the uniform prior distribution). As shown in [18] , the SR procedure is second-order asymptotically minimax with respect to Pollak's measure: The CuSum procedure with a certain threshold b α also has a second-order optimality property with respect to the risk SADD (τ ). A detailed numerical comparison of CuSum and SR procedures for i.i.d. models was performed in [19] . In the case where both the pre-and post-change observations are independent and the postchange observations are non-stationary, the log-likelihood ratio is: where n ≥ k ≥ 1. Here k is a hypothetisized change-point and X n is drawn from the true In the classical i.i.d. model described in Section II-A, the cumulative KL-divergence after the change point increases linearly in the number of observations. We generalize this condition as follows. Let g ν : R + → R + be an increasing and continuous function, which we will refer to as growth function. Note that the inverse of g ν , denoted by g −1 ν , exists and is also increasing and continuous. We assume that the expected sum of the log-likelihood ratios under P ν , which corresponds to the cumulative Kullback-Leibler (KL) divergence for our non-stationary model, matches the value of the growth function at all positive integers, i.e., Furthermore, we assume that E ν [Z i,ν ] > 0 for all i ≥ ν. and that for each x > 0 exists. Note that g −1 is also increasing and continuous. A key assumption that we will need for our analysis is that g −1 (x) satisfies We should note that such a growth function g(n) has been adopted in sequential hypothesis testing with non-stationary observations [1, Sec. 3.4] . In the special case where the post-change distribution is invariant to the change-point ν, i.e., for j ≥ 0, p 1,ν+j,ν is not a function of ν, we have g ≡ g ν and g −1 ≡ g −1 ν , for all ν ≥ 1. The proof of asymptotic optimality is performed in two steps. First, we derive a first-order asymptotic (as α → 0) lower bound for the maximal expected detection delays inf τ ∈Cα WADD (τ ) and inf τ ∈Cα SADD (τ ). To this end, we need the following right-tail condition for the loglikelihood ratio process: assuming that for all ν ≥ 1 At the second stage, we show that this lower bound is attained for the Window-Limited (WL) CuSum procedure under the following left-tail condition The following lemma provides sufficient conditions under which conditions (14) and (15) hold for the sequence of independent and non-stationary observations. Hereafter we use the notation Lemma II.1. Consider the growth function g ν (n) defined in (11) . Suppose that the sum of variances of the log-likelihood ratios satisfies Then condition (14) holds. If, in addition, for all ν ≥ 1 and all positive integers ∆, then condition (15) holds. The proof is given in the appendix. Remark. One can generalize condition (17) in a way that either holds for all positive integers ∆. Example II.1. Consider the following Gaussian exponential mean-change detection problem. Denote by N (µ 0 , σ 2 0 ) the Gaussian distribution with mean µ 0 and variance σ 2 0 . Let X 1 , . . . , X ν−1 be distributed as N (µ 0 , σ 2 0 ), and for all n ≥ ν, let X n be distributed as N (µ 0 e θ(n−ν) , σ 2 0 ). Here θ is some positive fixed constant. The log-likelihood ratio is given by: Now, the growth function can be calculated as Since the post-change distribution is invariant to the change-point ν, (13) . Also, the sum of variances of the log-likelihood ratios is for all t ≥ ν, which establishes condition (16) . Further, for any i ≥ ν and ∆ ≥ 1, which establishes condition (17) . The following theorem gives a lower bound on the worst-case average detection delays as Suppose that g −1 (x) satisfies (13) . Then for all δ ∈ (0, 1) and some ν ≥ 1 and as α → 0, Proof. Obviously, for any Markov time τ , Therefore, to prove the asymptotic lower bound (22) we have to show that as α → 0, where the o(1) term on the right-hand side does not depend on τ , i.e., uniform in τ ∈ C α . To begin, let the stopping time τ ∈ C α and note that by Markov's inequality, Hence, if assertion (21) holds, then for some ν ≥ 1 This implies the asymptotic inequality which holds for an arbitrary δ ∈ (0, 1) and some ν. Since by our assumption the function h δ (α) is continuous, taking the limit δ → 0 and maximizing over ν ≥ 1 yields inequality (23) . It remains to prove (21) . Changing the measure P ∞ → P ν and using Wald's likelihood ratio identity, we obtain the following chain of equalities and inequalities for any C > 0 and δ ∈ (0, 1): where the last inequality follows from the fact that Pr(A ∩ B) = Pr(A) − Pr(B c ) for any events where Since g(h δ (α)) → ∞ as α → 0, by condition (14), Next we turn to the evaluation of the term κ (ν) δ,α (τ ) for any stopping time τ ∈ C α . It follows from Lemma 2.1 in [2, page 72] that for any M < α −1 , there exists some ≥ 1 (possibly depending on α) such that so for some ν ≥ 1, = (g −1 (| log α|)) 2 , then for all sufficiently small α, so that the condition (13) is satisfied. Furthermore, for any p > 0. To see this, assume for purpose of contradiction that there exists some p 0 > 0 Hence, it follows that for some ν ≥ 1, (25), (26) , and (28) we obtain that for some ν ≥ 1 where the o(1) term is uniform over all ν ≥ 1. This yields assertion (21) , and the proof is complete. C. Asymptotically Optimal Detection with Non-stationary Post-Change Observations with Known Recall that under the classical setting, Page's CuSum procedure (in (7)) is optimal and has the following structure: where Z i is the log-likelihood ratio when the post-change distributions are stationary (defined in (6)). When the post-change distributions are potentially non-stationary, the CuSum stopping rule is defined similarly as: where Z i,k represents the log-likelihood ratio between densities p 1,i,k and p 0 for observation X i (defined in (10)). Here i is the time index and k is the hypothesized change point. Note that if the post-change distributions are indeed stationary, i.e., p 1,i,k ≡ p 1 , we would get Z i,k ≡ Z i for all k ≤ i, and thus τ C ≡ τ Page . As shown in (5) Then, the log-likelihood ratio is given by Note that Z n,t is a (linear) function of X n . Consider the following realization: It can be verified that Note that maximizer k * goes backward in time in this case, in contrast to what happens when both the pre-and post-change observations follow i.i.d. models. The test statistic at time n = 2 is a function of only X 2 , and this is insufficient to construct the test statistic at time n = 3, which is a function of X 1 , in addition to being a function of X 2 , and X 3 . For computational tractability, we therefore consider a window-limited (WL) version of the CuSum procedure in (30): where m is the window size. For n < m maximization is performed over 1 ≤ k ≤ n. In the asymptotic setting, m = m α depends on α and should go to infinity as α → 0 with certain appropriate rate. Specifically, following a similar condition that Lai [8] used in the asymptotically stationary case, we shall require that m α → ∞ as α → 0 in such a way that Since the range for the maximum is smaller inτ C (b) than in τ C (b), given any realization of X 1 , X 2 , . . ., if the test statistic ofτ C (b) crosses the threshold b at some time n, so does that of τ C (b). Therefore, for any fixed threshold b > 0, almost surely. In the following, we first control the asymptotic false alarm rate ofτ C (b) with an appropriately chosen threshold in Lemma II.2. Then, we obtain asymptotic approximation of the expected detection delays ofτ C (b) in Theorem II.2. Finally, we combine these two results and provide an asymptotically optimal solution to the problem in (1) in Theorem II.3. Lemma II.2. Suppose that b α = |log α|. Then i.e.,τ C (b α ) ∈ C α . Proof. Define the statistic and the corresponding stopping time T b := inf{n : Recall that F n = σ(X , 1 ≤ ≤ n) denotes a sigma-algebra generated by (X 1 , . . . , X n ). Since E ∞ e Z n,k |F n−1 = 1, it is easy to see that Consequently, the statistic {R n −n} n≥1 is a zero-mean (P ∞ , F n )-martingale. It suffices to assume (for any m α ≥ 1), and therefore (34) follows. The following result establishes asymptotic performance of the WL-CuSum procedure given in (31) for large threshold values. Theorem II.2. Fix δ ∈ (0, 1) and let N b,δ := g −1 (b/(1 − δ)) . Suppose that in the WL-CuSum procedure the size of the window m = m b diverges (as b → ∞) in such a way that Further, suppose that conditions (14) and (15) hold for Z n,k when n ≥ k ≥ 1. Then, as b → ∞, Proof. Since FAR (τ C (b)) ≤ e −b , the window-limited CuSum procedureτ C (b) belongs to class Hence, replacing α by e −b in the asymptotic lower bound (22) in Theorem II.1, we obtain that under condition (14) the following asymptotic lower bound holds: Thus, to establish (37) it suffices to show that under condition (15) Note that we have the following chain of equalities and inequalities: Define λ n,k := n i=k Z i,k and K n := ν + nN b,δ . We have W (n) = max n−m b N b,δ (for a sufficiently large b), for any n ≥ 1, where the last equality follows from independence of the increments of {λ t,n } n≥t . By condition (15), for a sufficiently large b there exists a small ε b such that Therefore, for any ≥ 1, April 15, 2022 DRAFT Combining this inequality with (40) and using the fact that ∞ Since the right-hand side of this inequality does not depend on ν, g −1 (b/(1 − δ)) → ∞ as b → ∞ and ε b and δ can be arbitrarily small numbers, this implies the upper bound (39). The proof is complete. Using Lemma II.2 and Theorem II.2, we obtain the following asymptotic result which establishes asymptotic optimality of the WL-CuSum procedure and its asymptotic operating characteristics. Theorem II.3. Suppose that threshold b α is so selected that b α ∼ | log α| as α → 0, in particular as b α = | log α|. Further, suppose that left-tail (14) and right-tail (15) conditions hold for Z n,k when n ≥ k ≥ 1. Then, the WL-CuSum procedure in (31) with the window size m α that satisfies the condition solves the problems (1) and (9) asymptotically to first order as α → 0, i.e., and Proof. Let b α be so selected that FAR (τ C (b α )) ≤ α and b α ∼ | log α| as α → 0. Then by Theorem II.2, as α → 0 Comparing these asymptotic equalities with the asymptotic lower bound (22) in Theorem II.1 immediately yields asymptotics (44) and (45). In particular, if b α = |log α|, then by Lemma II.2 FAR (τ C (b α )) ≤ α, and therefore the assertions hold. Remark. Clearly, the asymptotic optimality result still holds in the case where no window is applied, i.e., m α = n − 1. Example II.3. Consider the same setting as in Example II.1. We have shown that conditions (16) and (17) hold in this setting, and thus (14) and (15) also hold by Lemma II.1. Considering the growth function g(n) given in (19) , as n → ∞, we obtain (1)). Thus, as y → ∞, and if b α = | log α| or more generally b α ∼ | log α| as α → 0 we obtain We now study the case where the evolution of the post-change distribution is parametrized by an unknown but deterministic parameter θ ∈ R d . Let X ν , X ν+1 , . . . each have density p θ 1,0 , p θ 1,1 , . . . , respectively, with respect to the common non-degenerate measure µ, when postchange parameter is θ. Let P k,θ and E k,θ denote, respectively, the probability measure on the entire sequence of observations and expectation, when the change point is ν = k < ∞ and the post-change parameter is θ. Let Θ ⊂ R d be an open and bounded set of parameter values, with θ ∈ Θ. The log-likelihood ratio process is given by: for any n ≥ k and θ ∈ Θ. Also, the growth function in (11) is redefined as and it is assumed that g −1 θ (x) = sup ν≥1 g −1 ν,θ (x) exists. It is also assumed that The goal in this section is to solve the optimization problems (1) and (9) and the corresponding asymptotic optimization problems: find a change detection procedure τ * that minimizes these measures to first order in class C α , i.e., for all θ ∈ Θ, Consider the following window-limited GLR CuSum stopping procedure: it is guaranteed that θ ∈ Θ b for all large enough b. Since we are interested in class C α = {τ : FAR (τ ) ≤ α}, in which case both threshold b = b α and window size m b = m α are the functions of α, we will write Θ b = Θ α and suppose that Θ α ⊂ R d is compact for each α. Hereafter we omit the dependency ofθ n,k on α for brevity. In this paper, we focus on the case where Θ α is continuous for all α's. The discrete case is simpler and will be considered elsewhere. The following assumption is made to guarantee the existence of an upper bound on FAR. Assumption III.1. There exists ε > 0 such that for any large enough b > 0, where λ max (A) represents the maximum absolute eigenvalue of a symmetric matrix A and ε b 0 Example III.1. Consider again the Gaussian exponential mean-change detection problem in Example II.1. Now we consider the case where the exact value of the post-change exponent is unknown. Note that θ characterizes the entire post-change evolution rather than a single post-change distribution. We shall verify Assumption III.1 below. April 15, 2022 DRAFT Recalling the definition of log-likelihood ratio given in (18) , for any θ ∈ Θ and k ≤ i ≤ n Therefore, Since X i 's are i.i.d. under P ∞ , n i=k X i has a Gaussian distribution with mean ≤ (m b + 1)µ 0 and variance ≤ (m b + 1)σ 2 0 . Therefore, for any θ ∈ Θ, where Q(x) = (2π) −1/2 ∞ x e −t 2 /2 dt is the standard Q-function. Recalling the condition in (36) on the window size and using the formula (46) for the worstcase expected delay, we obtain that if we set Then Assumption III.1 holds when ε = (1 + δ)Θ max /Θ min with arbitrary δ > 0. Note that WADD θ (τ G (b)) ≤ WADD θ (τ C (b)) for any threshold b > 0. In order to establish asymptotic optimality of the WL-GLR-CuSum procedure we need the following lemma that allows us to select threshold b = b α in such a way that the FAR ofτ G (b) is controlled at least asymptotically. Lemma III.1. Suppose that the log-likelihood ratio {Z θ n,k } n≥k satisfies (52). Then, as b → ∞, where Remark. Since |Θ α | ≤ |Θ| < ∞, it follows from (56) that b α ∼ |log α| as α → 0. The proof of Lemma III.1 is given in the appendix. The following theorem establishes asymptotic optimality properties of the WL-GLR-CuSum detection procedure. Theorem III.1. Suppose that threshold b = b α is so selected that FAR (τ C (b α )) ≤ α or at least so that FAR (τ C (b α )) ≤ α(1+o(1)) and b α ∼ | log α| as α → 0, in particular from equation (56) in Lemma III.1. Further, suppose that conditions (14), (15) and (52) hold for {Z n,k } n≥k . Then, the window-limited GLR CuSum procedureτ G (b α ) defined by (51) with the window size m α that satisfies the condition (43) solves first-order asymptotic optimization problems (50) uniformly for all parameter values θ ∈ Θ, and as α → 0. Proof. Evidently, for any θ ∈ Θ and any threshold b > 0, April 15, 2022 DRAFT Let b = b α be so selected that FAR (τ G (b α )) ≤ α and b α ∼ | log α| as α → 0. Then it follows from the asymptotic approximations (45) in Theorem II.3 that, as α → 0, (1)). Comparing these asymptotic inequalities with the asymptotic lower bound (22) in Theorem II.1, immediately yields (57), which is asymptotically the best one can do to first order according to Theorem II.1. In particular, if b α is found from equation (56), then b α ∼ | log α| as α → 0 and by Lemma III.1 (1)), and therefore the assertions hold. The measure of FAR that we have used in this paper (see (4)) is the inverse of the MTFA . However, the MTFA is a good measure of the FAR if, and only if, the pre-change distributions of the window-limited CuSum stopping timeτ C (b) and the window-limited GLR CuSum stopping timeτ G (b) are approximately geometric. While this geometric property can be established for i.i.d. data models (see, e.g., Pollak and Tartakovsky [20] and Yakir [21] ), it is not neccessarily true for non-homogeneous and dependent data, as discussed in Mei [22] and Tartakovsky [23] . Therefore, in general, the MTFA is not appropriate for measuring the FAR. In fact, large values of MTFA may not necessarily guarantee small values of the probability of false alarm as discussed in detail in [1] , [23] . When the post-change model is Gaussian non-stationary as defined in Example II.1, the MTFA may still an appropriate measure for false alarm, as shown in the simulation study in Section V-B. Based on this result we conjecture that the MTFA-based FAR constraint may be suitable for other independent and non-stationary data models as well. However, in general, this may not be the case, and a more appropriate measure of the FAR in the general case may be the maximal (local) conditional probability of false alarm in the time interval (k, k + m] defined as [1] : Then the constraint set in (3) can be replaced by set of procedures for which the SPFA does not exceed a prespecified value β ∈ (0, 1). Pergamenschtchikov and Tartakovsky [14] , [24] considered general stochastic models of dependent and nonidentically distributed observations but asymptotically homogeneous (i.e., g(n) = n). They proved not only minimax optimality but also asymptotic pointwise optimality as β → 0 (i.e., for all change points ν ≥ 1) of the Shiryaev-Roberts (SR) procedure for the simple postchange hypothesis, and the mixture SR for the composite post-change hypothesis in class C β,m , when m = m β depends on β and goes to infinity as β → 0 at such a rate that log m β = o(| log β|). The results of [14] , [24] can be readily extended to the asymptotically non-homogeneous case where the function g(n) increases with n faster than log n. In particular, using the developed in [14] , [24] techniques based on embedding class C β,m in the Bayesian class with a geometric prior distribution for the change point and the upper-bounded weighted PFA, it can be shown that the window-limited CuSum procedure (31) with m α replaced by m β is first-order pointwise asymptotically optimal in class C β,m β = C β as long as the uniform complete version of the strong law of large numbers for the log-likelihood ratio holds, i.e., for all δ > 0 where in the general non-i.i.d. case the partial LLR Z i,ν is Specifically, it can be established that for all fixed ν ≥ 1, as β → 0, where we used the notation ADD ν (τ ) = E ν [τ − ν|τ ≥ ν] for the conditional average delay to detection. Similar results also hold for the maximal average detection delays WADD (τ ) and It is worth noting that it follows from the proof of Theorem II.1 that under condition (14) the following asymptotic lower bound holds for the average detection delay ADD ν (τ ) uniformly for all values of the change point in class C β : In the case where the post-change observations have parametric uncertainty, sufficient conditions for the optimality of the WL-GLR-CuSum procedure are more sophisticated -a probability in the vicinity of the true post-change parameter should be involved [14] . Further details and the proofs are omitted and will be given elsewhere. In Fig. 1 , we study the performance of the proposed WL-CuSum procedure in (31) through Monte Carlo (MC) simulations for the Gaussian exponential mean-change detection problem (see Example II.1), with known post-change parameter. The change-point is taken to be ν = 1 3 . Three window-sizes are considered, with the window size of 12 being smaller than the range expected delay values in the plot, and therefore not large enough to satisfy condition (32). The window size of 25 is sufficiently large, and the window size of 100 essentially corresponds to having no window at all. It is seen that the performance is nearly identical for all windowsizes considered. We also observe that the expected delay is O(log(|log α|)) for window-sizes considered, which matches our theoretical analysis in (46). It is seen that the operating characteristic of WL-GLR-CuSum is slightly worse but close to that of the WL-CuSum procedure that knows the true post-change parameter. We also observe that procedures with a slightly insufficiently large window-sizes perform similarly to those with a sufficiently large window sizes. In Fig. 3 , we study the distribution of the WL-CuSum stopping times using simulation results from the Gaussian exponential mean-change detection problem. This study is similar to the one in [20] . It is observed that the experimental quantiles of stopping times for the WL-CuSum procedure are close to the theoretical quantiles of a geometric distribution. This indicates that the distribution of the stopping time is approximately geometric, in which case MTFA is an appropriate false alarm performance measure, and our measure of FAR as the reciprocal of the MTFA is justified. Next, we apply the developed WL-GLR-CuSum algorithm to monitoring the spread of COVID-19 using new case data from various counties in the US [25] . The goal is to detect the onset of a new wave of the pandemic based on the incremental daily cases. The problem is modeled as one of detecting a change in the mean of a Beta distribution as in [26] . Let B(x; a, b) denote the density of the Beta distribution with shape parameters a and b, i.e., where Γ represents the gamma function. Note that the mean of an observation under density Here, h θ is a function such that h θ (x) ≥ 1, ∀x > 0. Note that if a 0 b 0 and h θ (n − ν) is not too large, for all n ≥ ν. We design h θ to capture the behavior of the average fraction of daily incremental cases. In particular, we model h θ as where θ 0 , θ 1 , θ 2 ≥ 0 are the model parameters and θ = (θ 0 , θ 1 , θ 2 ) ∈ Θ. When n − ν is small, h θ (n − ν) grows like the left tail of a Gaussian density, which matches the exponential growth in the average fraction of daily incremental cases seen at the beginning of a new wave of the pandemic. Also, as n → ∞, h θ (n − ν) → 0, which corresponds to the daily incremental cases eventually vanishing at the end of the pandemic. In Fig. 4 , we validate the choice of distribution model defined in (58) using data from COVID-19 wave of Fall 2020. In the simulation, a 0 and b 0 are estimated using observations from previous periods in which the increments remain low and roughly constant. It is observed that the mean of the daily fraction of incremental cases matches well with the mean of the fitted Beta distribution with h θ in (60). Note that the growth condition given in (49) that is required for our asymptotic analysis is not satisfied for the observation model (58) with h θ given in (60). Nevertheless, we expect the WL-GLR-CuSum procedure to perform as predicted by our analysis if the procedure stops during a time interval where h θ is still increasing, which is what we would require of a useful procedure for detecting the onset of a new wave of the pandemic anyway. The parameters θ0, θ1 and θ2 are assumed to be unknown. The window size mα = 20. The threshold is set using equation (56). In Fig. 5 , we illustrate the use the WL-GLR-CuSum procedure with the distribution model (58) for the detection of the onset of a new wave of COVID-19. We assumed a start date of June 15th, 2021 for the monitoring, at which time the pandemic appeared to be in a steady state with incremental cases staying relatively flat. We observe that the WL-GLR-CuSum statistic significantly and persistently crosses the detection threshold around late July in all counties, which is strong indication of a new wave of the pandemic. More importantly, unlike the raw observations which are highly varying, the WL-GLR-CuSum statistic shows a clear dichotomy between the pre-and post-change settings, with the statistic staying near zero before the purported onset of the new wave, and taking off very rapidly (nearly vertically) after the onset. We considered the problem of the quickest detection of a change in the distribution of a sequence of independent observations, assuming that the pre-change observation are stationary with known distribution, while the post-change observations are non-stationary with possible parametric uncertainty. Specifically, we assumed that the cumulative KL divergence between the post-change and the pre-change distributions grows super-linearly with time after the change point. We derived a universal asymptotic lower bound on the worst-case expected detection delay under a constraint on the false alarm rate in this non-stationary setting, which had been previously been derived only in the asymptotically stationary setting. We showed that the developed WL-CuSum procedure for known post-change distribution, as well as the developed WL-GLR-CuSum procedure for the unknown post-change parameters, asymptotically achieve the lower bound on the worst-case expected detection delay, as the false alarm rate goes to zero. We validated these theoretical results through numerical Monte-Carlo simulations. We also demonstrated that the proposed WL-GLR-CuSum procedure can be effectively used in monitoring pandemics. We also provided in Section IV some possible avenues for future research, in particular, those allowing for dependent observations and more general false alarm constraints. Proof of Lemma II.1. For the first inequality, fix ν ≥ 1 and δ > 0. Note that g ν (n) = and, by definition, Thus, for an arbitrary ν > 1 we have where ( * ) follows from Kolmogorov's inequality and the last line follows by independence. Hence, where the limit follows from condition (16) . For the second inequality, fix ν and t such that t ≥ ν ≥ 1. For any δ ∈ (0, 1), we have where ( * * ) follows from Chebyshev's inequality. Thus, by condition (16), The proof is complete. Proof of Lemma III.1. Let λ θ n,k = n i=k Z θ i,k denote the log-likelihood ratio between the hypotheses that ν = k with the parameter θ against ν = ∞ in the sample (X 1 , . . . , X n ). We re-write the definition ofτ G (b) in (51) as: for a given pair (k, n) where k ≤ n. Note that now instead of Θ α we use the notation Θ b , which is a compact subset of Θ. Let Π be a probability measure on Θ b . Recall that F n = σ(X 1 , . . . , X n ) and F ∞ = σ(X 1 , X 1 , . . . ). Given this mixing distribution over Θ b , define Q k by Then Q k is easily seen to be a probability measure. Moreover, letting P n ∞ and Q n k denote the restrictions of P ∞ and Q k to the sigma-algebra F n introduce the likelihood ratio which along with the previous argument implies that (1)). This inequality implies inequality (55). Thus, it remains to prove the asymptotic inequality (65). By assumption,θ n,k lies in the interior of Θ b (for sufficiently large b). Using Taylor's expansion, for any k ≤ n and θ ∈ Θ, λ θ n,k = λθ n,k n,k + ∇ θ λ θ Sequential Analysis: Hypothesis Testing and Changepoint Detection, ser. Monographs on Statistics and Applied Probability 136 Sequential Change Detection and Hypothesis Testing: General Non-i.i.d. Stochastic Models and Asymptotically Optimal Rules, ser. Monographs on Statistics and Applied Probability 165 Academic press library in signal processing: Array and statistical signal processing Sequential (quickest) change detection: Classical results and new directions Procedures for reacting to a change in distribution Optimality and almost optimality of mixture stopping rules Using the generalized likelihood ratio statistic for sequential detection of a changepoint Information bounds and quick detection of parameter changes in stochastic systems Asymptotic optimality of certain multihypothesis sequential tests: Non-i.i.d. case General asymptotic Bayesian theory of quickest change detection On asymptotic optimality in sequential changepoint detection: Non-iid case Asymptotic optimality of change-point detection schemes in general continuous-time models Asymptotic optimality of mixture rules for detecting changes in general stochastic models Asymptotically optimal pointwise and minimax change-point detection for general stochastic models with a composite post-change hypothesis Continuous inspection schemes Optimal stopping times for detecting changes in distributions Optimal detection of a change in distribution Third-order asymptotic optimality of the generalized Shiryaev-Roberts changepoint detection procedures Numerical comparison of CUSUM and Shiryaev-Roberts procedures for detecting changes in distributions Asymptotic exponentiality of the distribution of first exit times for a class of markov processes with applications to quickest change detection A note on the run length to false alarm of a change-point detection policy Is average run length to false alarm always an informative criterion? Discussion on "is average run length to false alarm always an informative criterion?" by Yajun Mei Asymptotically optimal pointwise and minimax quickest change-point detection for dependent data Latest Map and Case Count A COVINDEX based on a GAM beta regression model with an application to the COVID-19 pandemic in Italy