key: cord-0437844-j7ldl3nm
authors: Tickle, S. O.; Eckley, I. A.; Fearnhead, P.
title: A computationally efficient, high-dimensional multiple changepoint procedure with application to global terrorism incidence
date: 2020-11-06
journal: nan
DOI: nan
sha: c8cfb46bdcd6d5d5af52d742ebb4887b41f01c8f
doc_id: 437844
cord_uid: j7ldl3nm

Detecting changepoints in datasets with many variates is a data science challenge of increasing importance. Motivated by the problem of detecting changes in the incidence of terrorism from a global terrorism database, we propose a novel approach to multiple changepoint detection in multivariate time series. Our method, which we call SUBSET, is a model-based approach which uses a penalised likelihood to detect changes for a wide class of parametric settings. We provide theory that guides the choice of penalties to use for SUBSET, and that shows it has high power to detect changes regardless of whether only a few variates or many variates change. Empirical results show that SUBSET out-performs many existing approaches for detecting changes in mean in Gaussian data; additionally, unlike these alternative methods, it can be easily extended to non-Gaussian settings such as are appropriate for modelling counts of terrorist events.

The canonical, one-dimensional, changepoint analysis problem has been the focus of substantial research effort for many years. Much initial effort was placed on developing methodology to detect This application raises two challenges that are common in modern applications. The first is the need to analyse multivariate data, and detect changes that may affect only some of the variates. The second is the need for methods that can detect changes when it is not appropriate to model the data as a change in mean of Gaussian data -which has been the focus of most multivariate changepoint methods to date (Wang and Samworth, 2018; Enikeeva and Harchaoui, 2019; Hahn et al., 2020) . Simple application of existing methods to these data are unreliable as they detect too many changes; see Appendix B of Tickle (2020) for empirical confirmation of this.

The method we develop is based on likelihood ratio tests, and can be applied across a range of modelling scenarios. This makes it stand out from most current competitors, designed to detect change in mean in Gaussian data, and is important for our application where the data are in the form of counts. In such a setting, it is natural to model this using a negative binominal model due to substantial over-dispersion in the data relative to a Poisson model. Whilst not needed for our application, the fact that our method combines likelihood ratio tests for a change on each variate means that it can easily be applied to mixed data settings -where we may wish to use different models for the different variates which may be of different types (e.g. a mix of continuous, count and categorical data). For the case of a change-of-mean in Gaussian data, we analyse theoretically the properties of our method, and show that it achieves similar asymptotic power to the best possible for this class of models.

Over the past fifty years, terror activity has seemingly transitioned from being a phenomenon perpetuated by a small number of localised groups to an issue of international significance. One question we may ask is whether there have been specific points or periods of time in which changes in the number of terrorist attacks occur, either regionally or worldwide. Such changes, if confined to certain regions of the globe, may point to specific events which altered the probability of an attack in a particular country or continental area. Alternatively, if a change is seen to affect terrorism worldwide, one might potentially identify events that had a much more wide-ranging scope. Identifying historically impactful events has benefits for future policy-makers, both national and international, particularly in regions of the world where terrorism may be an especially acute issue.

The Global Terrorism Database (Jensen, 2018) is a global historical record of terrorist incidents maintained by the National Consortium for the Study of Terrorism and Responses to Terrorism at the University of Maryland, and the database is copyrighted to the University of Maryland (2018). The database is a compilation of terror events that have occurred worldwide since 1 January 1970. The original platform for the database was the Pinkerton Global Intelligence Services who, from 1970 to 1997, employed a number of researchers to collate and record events from numerous domestic and foreign reports. During this period, an event which came to the attention of the compilers was valid for inclusion if it involved the "threatened or actual use of illegal force and violence to attain a political, economic, religious or social goal through fear, coercion or intimidation" (LaFree, 2010) . In addition, any event known to have been perpetrated by a sovereign state was not included.

The Global Terrorism Database has been subject to several analyses in recent years. LaFree and Dugan (2007) , LaFree (2010) and LaFree et al. (2014) highlighted the higher incidence of terrorism in Europe in the 1970s; a period of unusually high terrorist activity in Latin America between 1980 and 1997; and a more general note regarding the concentration of most incidents within geographic space. Particular events or points in time that may have seen changes in terror activity were also a significant theme of interest. On a global scale, LaFree (2010) notes that terrorism tripled between 1976 and 1979, with a doubling just in the final year of this period (LaFree and Dugan, 2007) . One of the principle findings of an analysis by Santifort et al. (2012) suggests that, by 2010, the variation in the mode of terror attacks had declined significantly since 1970, with bombings, typically of nonofficial, private entities or groups of people, becoming the preferred method of terrorists worldwide. Other analyses of the Global Terrorism Database examine the relationship between terrorism and a wide range of distinct global applications. These include investor sentiment, tourism levels in the United Kingdom, and the worldwide cost of debt; see Drakos (2010) , Mao (2019) and Procasky and Ujah (2016) , among others.

The database has also been used to analyse terrorism at a much more local level. For example, LaFree (2010) discusses the activities of the Armenian Secret Army for the Liberation of Armenia (ASALA). The link between government policy change and a change in the probability of a terrorist attack has also been examined. For example, LaFree (2010) used the database to perform an analysis on six policies enacted by the British government to attempt to reduce terror activity in Northern Ireland. It was discovered that three of these policies in fact resulted in a significant increase in terrorism, with only one leading to a reduction. Meanwhile, Raghavan et al. (2013) focus on the activities of the Revolutionary Armed Forces of Colombia -People's Army (FARC) between 1998 and 2006. They conclude that, from the end of June 1998, funding from the United States to combat the drug economy in Colombia had an impact in reducing FARC's activities in the short to medium term.

We use the Global Terrorism Database to provide event incidence in twelve regions: Australasia & Oceania, Central America & Caribbean, Central Asia, East Asia, Eastern Europe, Middle East & North Africa, North America, South America, South Asia, Southeast Asia, Sub-Saharan Africa and Western Europe. Given that these political terms may be somewhat fluid geographically, we show this division pictorially in Figure 1 . Note that these regions are as defined by the compilers of the database. It would also be possible to carry out our later analysis of the database on a much finer scale, for example, by country. However, given that many sovereign states have undergone substantial border changes in the last fifty years, such an analysis would likely require a great deal of care. Figure 1 : Definition of twelve regions used when analysing the global terrorism data set -each region is denoted by a different colour. This map was created with the aid of the maps package of Becker and Wilks (2018) .

For each of the twelve regions, we aggregate all incidents for each month to produce one univariate time series of counts for each region. Each of these is of length 564, one for each month between January 1970 and December 2017 inclusive. Note that 1993 is not included, as that year's data are missing from the publicly-available copy of the database. The resulting incident count by region is shown in Figure 2 .

We can see from Figure 2 that there appear to be periods of time in which terrorism abruptly increases or decreases in specific regions. Whilst the question of locating changepoints in the database could be answered using univariate changepoint methods, we seek to analyse these series jointly and thereby potentially identify subsets of series within which changes occur. Such a setting is inherently more computationally complex, due to the rapid increase in possible changepoint combinations that can occur in a high-dimensional setting. This is the challenge we seek to address, proposing a novel computationally efficient approach to identify changepoints in multivariate data sequences. Figure 1 . Note that the series' colours match those of Figure 1 : Australasia and Oceania (blue); Central America and Caribbean (green); Central Asia (brown); East Asia (yellow); Eastern Europe (red); Middle East and North Africa (black); North America (purple); South America (orange); South Asia (cyan); Southeast Asia (pink); Sub-Saharan Africa (gold); and Western Europe (grey).

Whilst detecting changes in univariate data sequences has a long history, there has been much less work on methods for detecting potentially multiple changepoints in multivariate datasets. Univariate approaches can be easily adapted to the multivariate setting if we are willing to assume all variates change at each changepoint (e.g. Wessman, 1998) . However, this may not be appropriate in applications where some, but not all, variates are affected by each changepoint; or where it is not known a priori whether a change will only affect a very small number or many of the variates.

Within the multivariate changepoint setting, the change in mean problem has to date received the most substantial focus. In this setting, evidence for a change in a single series can, for example, be quantified using CUSUM statistics -a weighted difference in the empirical mean before and after the potential changepoint. The simplest ways of combining evidence across time series are to (i) perform some form of averaging of the CUSUM statistics; or (ii) take the maximum value of the CUSUM statistics. Naturally, the detection boundaries, i.e. how large a change in mean is needed before the presence or absence of change can be determined with probability tending to 1, are very different for approaches (i) and (ii), with the first being able to detect small changes that affect most variates, and the latter requiring at least one variate to change by a sizeable amount. Enikeeva and Harchaoui (2019) investigate the detection boundary for a change in mean in a high-dimensional asymptotic setting where the number of variates, d, the number of variates that change and the number of observations per variate increase. They show that there are two regimes depending on whether the number of variates that change increases faster than √ d, or at or slower than a √ d rate. We call the former regime the dense change regime, and the latter the sparse change regime. Informally, approach (i) works well in the dense regime, while (ii) works well in the sparse regime.

In recent years, a number of other approaches have been proposed that seek to strike a balance between the (i) mean and (ii) max CUSUM-based approaches. For example, Cho and Fryzlewicz (2015) and Cho (2016) sum only CUSUM statistics that exceed a certain threshold; meanwhile, Wang and Samworth (2018) consider sparse projections of the data, which is equivalent to using a weighted average of CUSUM statistics. These approaches can demonstrate strong empirical performance, but neither has been shown theoretically to simultaneously work as well in both the dense and sparse settings. For example, the method of Wang and Samworth (2018) was designed for detecting sparse changes, and its theory establishes strong performance in precisely that setting. It is the development of an approach that seeks to work well in both settings that we introduce below. The one current method that has been shown to work well in both dense and sparse settings is that of Enikeeva and Harchaoui (2019) which combines a test statistic based on combining all CUSUM statistics with one based on the largest p statistics and scans over all choices of p.

To this end, suppose that the data sequence for each variate, (y i,j ) n j=1 for i = 1, . . . , d, within the dataset, y 1:n , can be segmented by changepoints, which are often shared across variates within the data. So for the global terrorism data y i,j corresponds to the count of terrorist events in region i and time point j, and we have d = 12 and n = 564. We define the set of changepoints to be points where at least one variate undergoes a change. Therefore, for each changepoint there is an associated affected set of variates which undergo a change.

Formally, let 0 = τ 0 < τ 1 < . . . < τ m < τ m+1 = n be the changepoints, with corresponding affected sets S 1 , . . . , S m . Note that it is possible for a given variate to be affected at more than one changepoint, so these affected sets are allowed to overlap. We will assume a parametric model for the data within a segment for each variate, and further that the segment parameter for this model only changes at changepoints which affect that variate. To simplify the exposition, assume that the data are conditionally independent given the segment parameters. In other words, we have

for some family of densities g(.|.), where k = |{v : τ v < j}|. Here, k denotes the segment associated with time-point j, and µ i,k is the associated segment parameter for series i. We have µ i,k = µ i,k+1 unless i ∈ S k : the k th and (k + 1) th segment parameters of series i are identical unless the k th change affects series i.

Our definition of the within-segment data-generating processes given above allows our method to solve for a wide class of possible changepoint problems. For instance, the well-studied canonical problem of Gaussian change in mean can be included by setting

where µ i,k is the mean of the signal in the i th variate following the (k − 1) th changepoint which affects variate i, and σ 2 i is some (known) variance. However, for the Global Terrorism Database example discussed in Section 2, this classical problem setup is inappropriate. Given that this particular application is comprised of count data across multiple regions, we can use a negative binomial likelihood such that

which we can refer to as

Here, r i governs the amount of over-dispersion, relative to a model, in the i th variate. For a fixed r i , p i,k then determines the mean of the i th variate following the (k − 1) th changepoint.

Whilst our aim is to jointly detect the number and location of all changepoints, as well as the variates that are affected at each change, we first introduce and develop theory associated with the analysis of our approach under the assumption that there is at most one changepoint. This will subsequently be extended within a binary segmentation algorithm to detect multiple changepoints.

We begin with a derivation of the test statistic used by SUBSET in the single change setting. The log-likelihood ratio statistic for detecting a changepoint at time τ , affecting variates in set S, is R(τ, S) = 2 i∈S max µ τ t=1 log g(y i,t |µ) + max µ n t=τ +1 log g(y i,t |µ) − max µ n t=1 log g(y i,t |µ) .

To simplify the notation, let C(y i,s:t ) = −2 max µ t j=s log g(y i,j |µ), and D i,t = C(y i,1:n ) − C(y i,1:t ) − C(y i,t+1:n ) denote the contribution to the log-likelihood ratio statistic from series i if it is assumed to change at time t. Then

Directly using the log-likelihood test statistic is complicated, due to the fact we do not know τ or S. In addition, different choices for S will allow for different numbers of variates to change. We therefore consider a penalised version of the test statistic, where the penalty depends on the number of variates that change, |S|. We then maximise over possible choices of τ and S. That is, we use max t S t as our test statistic where, for t = 1, . . . , n − 1,

for some suitable penalty function Pen(·).

For both theoretical and computational reasons (see Section 4.2 and Section C of the Supplementary Material respectively), we suggest a piecewise linear penalty of the form

for some suitable constants α, β and K. We then detect a change if max t S t > 0, with the location atτ = arg max S t and the set of estimated affected variates is given by

We choose a piecewise linear penalty as this makes the maximisation over S computationally efficient. In particular, we can define D i,t = max{D i,t − α, 0}, and then

The two terms in the maximisation above correspond to the two different linear regimes in the penalty function. As we shall see the β + αp component of the penalty function, previously considered by Pickering (2016) as a means of penalising changes across both time and variates, determines the test statistic's behaviour for detecting sparse changes. Meanwhile, the constant term, K, is needed to improve power for detecting dense changes. If ξ = arg max t d i=1 D i,t − β and S ξ > 0, then we say that we have detected a sparse change, with evidence for a change only in those variates i such that D i,ξ > 0. If, however, η = arg max t d i=1 D i,t − K and S η > 0, then this is identified as a dense change and all variates are labelled as affected.

To understand the behaviour of the test statistic for a single change, and obtain guidelines for choosing the constants that define our penalty function, we study its theoretical properties for the canonical change in mean problem with Gaussian noise and a common, known variance, σ 2 . This means that D i,t is χ 2 1 -distributed when no changepoint is present. The results we present are also useful for choosing the constants of the penalty function in cases where D i,t is based on a likelihood ratio test for the change of a single parameter, when D i,t would be approximately χ 2 1 -distributed if there is no change.

As we are considering just a single change, we will simplify notation so that µ i,1 is the initial mean of series i. If there is a change, µ i,2 will be the mean after the change, and µ i,1 = µ i,2 if i / ∈ S 1 . Thus, the data-generating model is

where the i,j , for i = 1, . . . , d, and j = 1, . . . , n are a set of centered, independent and identically distributed Gaussian random variables.

For this particular problem, we have that

We use these expressions to establish false positive and detection probability results in the single change setting under Gaussian noise when max t S t is taken as the test statistic.

Our first theoretical contribution concerns the false positive rate of the chosen test statistic.

Theorem 4.1. Suppose we are in setting (2), and that in addition

where C is an absolute constant bounded above for all d > 1.

Proof : See Section B of the Supplementary Material.

This result therefore provides guidance on suitable choices of the penalty for the Gaussian setting. We note that if J ≥ 2 this will provide a test whose false positive rate will tend to 0 as n → ∞. However the bound on the false positive rate ignores the strong positive correlation in S t for different values of t, and thus will be conservative. As such we suggest using simulation to choose β, and the corresponding value of K. This can be done for a given value of n and d by simulating multiple datasets with no change, and setting β to, e.g., the value which gives no detected changes for 95% of simulated datasets. We note also that Theorem 4.1, under a scenario of no change, uses the marginal χ 2 -distribution of D i,t . This suggests that setting α = 2 log d and using a simulation approach to choose β (and hence K) would work similarly in other settings when the marginal distribution of D i,t is approximately chi-squared with one degree of freedom. This would be possible, for example, in the common scenario of detecting a change in a single parameter using a likelihood-ratio test. Given these penalty choices, we can next state a result on the power of this procedure.

Theorem 4.2. Assume that we are again in setting (2) with σ 2 = 1, and now we have that

Proof : See Section B of the Supplementary Material.

The two constants, V S and V D , correspond to the ability of our method to detect either a sparse change or a dense change; and come from considering the event that our test statistic is positive for a penalty that is just α|S 1 | + β or just K respectively. It is by using a penalty that is the minimum of these two that gives good performance across both sparse and dense change setting. To see this, and to help understand the result, consider the behaviour of V S and V D in an asymptotic setting where n and d increase, with d increasing at a polynomial rate in n, and the number of variates that change increasing at a polynomial rate in d. In this case we have

For the change in mean setting, it is possible to draw strong comparisons between our approach and other multivariate changepoint tests, the main difference being in terms of how the methods aggregate evidence for a change across different variates. These alternative approaches use the CUSUM statistic for each variate within the dataset, defined, in the known σ 2 case, as

Therefore, for the Gaussian change in mean setting, our test statistic can be expressed in terms of the CUSUM statistic as

For comparison, three previously proposed test statistics, which we refer to herein as Mean (Groen et al., 2013) , Max (Groen et al., 2013) and Bin-Weight (Cho and Fryzlewicz, 2015) are, respectively

From the results in Enikeeva and Harchaoui (2019), we know that S (mean) t will have high power for dense changes which affect most series, but lose power for sparse changes where few variates change. By comparison, S (max) t will have higher power in the sparse case and lower power in the dense case. These can be combined to produce a test with high power across both cases (see Enikeeva and Harchaoui, 2019 , though their proposed method combines different test statistics to mean and max cusum). The Bin-Weight method is closest to the one that we propose, particularly if we set α = √ 2 log d, which is equivalent to the threshold we use. Other than using CUSUM statistics rather than their squares, there are two main differences between Bin-Weight and our method. The first is that as W i,t increases its contribution to S t will jump from 0 to W i,t when it first exceeds the threshold α; by comparison our approach uses a soft-threshold of W 2 i,t , which avoids such a jump and thus reduces the variability of the test statistic. Second, for a choice of α = √ 2 log d, Bin-Weight will lose power in dense scenarios compared to our approach which caps the overall penalty at K.

We here formally introduce SUBSET (Sparse and Ubiquitous Binary Segmentation in Efficient Time), the full procedure for the use of the test statistic max t S t given in Section 4.1. Given this thresholded penalty approach, SUBSET is designed to detect both sparse and dense changes, the latter of which are labelled by SUBSET as affecting all variates within the data.

In order to detect multiple changes within the data, SUBSET embeds the test for detecting a single changepoint within Wild Binary Segmentation (Fryzlewicz, 2014) . When implementing this procedure, we recommend setting α = 2 log d, keeping the same relationship between β and K, and then tuning β so that the wild binary segmentation procedure has an appropriate false positive rate for a given n and d on data simulated with no change.

An issue with SUBSET is that while the estimates ofτ tend to be fairly reasonable, when estimating multiple changes the estimates ofŜ are prone to error due to masking from other changepoints. This is especially true for variates for which there may be a particularly strong change at a nearby time point. For sparse changes, we propose using a post-processing step where we individually analyse data from each variate conditional on the set of estimated changes,τ = (τ 1 , . . . ,τm). When analysing a single variate, we only allow changes to occur within the setτ . We detect the changes by minimising the univariate version of our penalised cost, that is introducing a change in a given variate if it reduces the cost by at least α. Formally, for variate i, we find arg min

This can be done efficiently using dynamic programming. See, for example, Section 2 of Tickle et al. (2020) for details.

Pseudo-code for the full SUBSET algorithm is provided in Section C of the Supplementary Material.

We examine the properties of the SUBSET method against the CUSUM aggregation procedures discussed in Section 4.3. In addition, we compare these methods against Inspect (Wang and Samworth, 2018), a recent leading approach for detecting changes in mean under Gaussian noise for high-dimensional series. To implement Inspect, we use code from the InspectChangepoint package (Wang and Samworth, 2016) .

All simulations were run in R using a Linux OS on a 2.3GHz Intel Xeon CPU. We examine multivariate series with pairwise independent Gaussian noise with variance 1 and count data generated according to a negative binomial likelihood model under various different dispersion parameters. For all scenarios considered, 200 repetitions were simulated. The threshold penalty for Inspect and the β values for SUBSET and the CUSUM-based methods were computed using simulations from the null model, such that the false alarm rate was fixed at 5%. For comparability with SUBSET, the α value for Bin-Weight was taken to be √ 2 log n. For further justification of this choice, see Wang and Samworth (2018) .

To check the power of the methods in the single change setting we increase ∆, the absolute change in mean -for each variate in which a change occurs -from 0.01 to 1.00 in increments of 0.01, and record the proportion of tests which yield a missed change in each case. We do this for n = d = 1000 and for densities of change corresponding to 0.5%, 1%, 5%, 10%, 50% and 100% of the variates affected by the change. Figure 3 shows the result of this simulation when the location of the change is at τ = 50, for each of the densities of change, and for each of the methods under investigation. Qualitatively similar results were observed for other settings; see, for example, Section B.4 of Tickle (2020) . The performance of different methods can be seen to depend on the proportion of series that change. For changes which affect a high proportion of variates, the Mean method is best, though SUBSET is very competitive. The other methods perform substantially worse, which is in keeping with the theory for these methods, which suggests that they are powerful for sparse changes but lose power for dense changes. By contrast, when the proportion of series that change is less than 1%, the Mean method performs poorly. However SUBSET retains high power that is similar to or better than the competing methods. This is in accordance with Theorem 4.2 which suggested that SUBSET would have good properties across both sparse and dense change scenarios.

We remark that it is surprising to see Inspect show much lower power than Max, SUBSET and Bin-Weight for the more sparse changes. We believe this is due to the default choice of tuning parameter used when performing the sparse projection of the data within the Inspect procedure. We found that increasing this tuning parameter, which leads to sparser projections, improves the performance of Inspect in these cases; note, however, that doing this will reduce the power for the cases where most series change.

We next compare the average location errors of the methods for three scenarios, corresponding to a very sparse setting (0.5% density of the change, and a change magnitude of ∆ = 0.33), a sparse setting (5% density and ∆ = 0.33) and a dense setting (50% density and ∆ = 0.1). We again consider n = d = 1000, as in Figure 3 , however we now have τ = 382. The results are shown in Figure 4 . For the two sparsest scenarios, we see that Inspect and SUBSET perform the best, with peaks correctly centred at the true changepoint. For the densest scenario, SUBSET demonstrates the best performance, with Mean a close competitor.

The ability of SUBSET to estimate which subset of variates undergoes a change is also of interest. We examine the performance of SUBSET in determining the affected set in a setting with a small number of time points (n = 400) and increasing number of variates (d = 200, 400, 800, 1600, 3200 and 6400). Given that the estimated affected set at a change is only returned by SUBSET if the procedure determines that putative changepoint is sparse, we take a very sparse setting, |S|/d = 0.005. The results of this are shown in Figure 5 , which indicates that SUBSET correctly identifies the affected set in sparse settings for sufficiently large ∆. Figure 5 also indicates that SUBSET very rarely includes unaffected variates in the estimated affected set, with far fewer than 1% of unaffected variates being incorrectly labelled in the worst cases, typically for small ∆. 

We now compare methods for detecting multiple changepoints. We examine five scenarios, which we label as A, B, C, D and E here, each with three changepoints present. In each case, the changepoints may be found at times 600, 783 and 926. The only difference between scenarios is the size of each affected set of variates at each change. Thus, the scenarios imply different affected sets depending on the value of d. The scenarios are summarised below for d = 1000. Note that we once again fix σ 2 = 1 in all cases.

• A: All three changes affect all variates.

• B: The first and third changes affect all variates; the second change affects 0.5% of variates.

• C: The first and third changes affect 0.5% of variates; the second change affects all variates.

• D: All changes affect 1% of variates.

• E: The first, second and third changes affect 0.5%, 1% and 5% of variates respectively.

Here, we restrict ourselves to examining the power of the methods. Note that we herein define a 'missed change' as being a true changepoint for which the methods do not place an estimated change within log n points, while a 'false alarm' is classed as an estimated changepoint for which no true changepoint is within log n time points. Note that the choice of a log-tolerance is motivated by classical theory on the localisation of changepoints in the univariate setting (Wang et al., 2020) . Each method was run within wild binary segmentation based on running each test on 1000 random intervals (except for Inspect, for which only 100 intervals were simulated due to the higher computational cost). The threshold used to detect changes was chosen such that for data with no change, each method had a false positive rate of 5%. Table 1 shows the average number of changes -across 200 repetitions (100 for Inspect, again for computational reasons) -missed by each of the methods in each of the five scenarios for n = d = 1000 when ∆ = 1 for all variates which undergo a change at any changepoint. As can be seen from Table 1 , the best performing method in many scenarios was SUBSET, giving a low number of missed changes and false alarms in each case. In other words SUBSET performs well in both sparse and dense settings, even in cases with multiple changepoints. Only Inspect is competitive, with a slightly lower average number of missed changes in scenarios B and D, at the expense of a higher false alarm rate.

In order to suitably assess the performance of SUBSET in the relatively low-dimensional setting of the Global Terrorism Database, we conclude this section by examining a smaller example. Given that our treatment of the database involves converting the data to a 12-variate system, the following simulations involve d = 12. We define scenarios A' -D' as analogues to scenarios A -D, in which

• A': All three changes affect all variates.

• B': The first and third changes affect all variates; the second change affects the first and seventh variates. • C': The first and third changes affect only the first and seventh variates; the second change affects all variates.

• D': All changes affect only the first and seventh variates.

In addition, we include a "surge" variant of each of these scenarios by adding two additional changepoints, at τ = 280 and τ = 320, which affect only the third variate. These two changes collectively form an epidemic change, so that the third variate returns to its original signal value. Table 2 shows the average number of missed changes and the average number of false alarms under 200 repetitions for each of the methods discussed, for ∆ i = 1 at each changepoint -with ∆ 3 = 5 at the surge changepoints, if present -keeping σ 2 = 1 throughout. Once again, n = 1000, and the definitions of missed change and false alarm are as before, as are the constructions of the penalty values for each procedure.

We observe from Table 2 that all methods (with the possible exception of Max) do fractionally worse in the presence of the "surge" segment. However performance remains roughly consistent.

In the case of SUBSET, we see that both the average number of missed changes and the average number of false alarms incurred remain very low, with only Inspect being a consistent competitor across all scenarios.

Finally, motivated by problem of detecting changes in the global terrorism data, we consider detecting changes in count data. To allow for over-dispersion relative to a Poisson model, we assume that each data point is the realisation of a negative binomial random variables. Formally, we have for a common unknown over-dispersion parameter, r i and some sequence, (p i,k ) m+1 k=1 , of unknown success probabilities.

At the time of writing, we believe that our method is the only one which naturally extends to such a setting, as we can re-define the likelihood ratio test statistic that our procedure is based on so that it relates to the negative binomial model we are fitting. In particular, we use the likelihood ratio test statistic for a change in success probability, assuming a common over-dispersion parameter, with the over-dispersion parameter estimated by a methods of moments estimator (Savani and Zhigljavsky, 2006) . This test is a natural one for detecting changes in the mean of the data. Note that we use a method of moments estimator, rather than the maximum likelihood estimator, given the difficulty of computing the latter to appropriate precision in efficient time under a negative binomial model.

We evaluate the performance of SUBSET for scenarios similar to the low-dimensional examples discussed in Section 5.2. In the examples here, we take a change in probability of 0.1 at each variate which is affected at each changepoint. We take three different over-dispersion parameters of r = 3, 20 and 100, to give a total of twenty-four different experiments, with half of the experiments including a surge at the same location as in the Gaussian example, in which there is an epidemic change of size p = 0.1 for the third variate only. The results are shown in Table 3 .

We see from Table 3 that SUBSET consistently performs well in the small d settings, with performance improving for larger values of the over-dispersion parameter r and fewer sparse changes present. Note that in the low r setting, SUBSET misses a significant proportion of sparse changes if most of the changes are sparse. This suggests that, for those periods of time and geographical regions where the number of terrorist incidents is low, there is a non-trivial probability that SUBSET will fail to detect changes in the latent probability of a terrorist attack.

We now return to the Global Terrorism Database, and apply SUBSET to detect changes in the recorded incidence of terrorism events. Of particular interest is whether our method can detect geographically-localised changes, such as those that have previously been noted in the discussion in Section 2, as well as changes that affect all series, for example due to changes in how events are recorded in the database.

One approach to modelling the terrorism count data might be to adopt a Poisson likelihood, such that a changepoint corresponds to an alteration in the rate parameter. However, we found that the data poorly conformed to a Poisson, due to a high degree of over-dispersion, and the use of a Poisson-based segment cost function led to SUBSET placing a very large number of changepoints. We therefore model the data for each series as realisations from a negative binomial with changing 'success' probability parameter, and apply SUBSET as per the study in Section 5.3. Figure 6 displays the output of our analysis for the Middle East and North Africa, North America and Western Europe regions, highlighting the dates of the changes which affect these regions. For a fuller picture, with all twelve regions displayed, please see Section D.1 of the Supplementary Material. Some notable features are apparent. For example, one of the very few dense changes located by SUBSET (in January 1998) corresponds to an alteration in the data collection method for all regions. As noted in Section 2, between 1970 and 1997, the Pinkerton Global Intelligence Services collated the information for the database from international reports in effectively 'real-time'. From the beginning of 1998, events were only added to the database retrospectively by the Study of Terrorism and Responses to Terrorism and the Center for Terrorism and Intelligence Studies. In addition, the definition of a terrorist event was refined into a set of six criteria, with an event having to satisfy at least five of the criteria to be included in the database. The overall effect, however, was an expansion in the definition of what constitutes a terrorist action. In this context, a dense changepoint is unsurprising. We remark that the Middle East and North Africa region is currently one of the few regions which is not labelled as affected by the dense change at the beginning of 1998. This is due to the post-processing step within SUBSET.

The other changepoints which affect more than one or two of the series in our analysis are concentrated at the end of the 1970s. This again conforms to the general picture discussed by LaFree and Dugan (2007) and LaFree (2010) and highlighted in Section 2. Eight of the twelve regions are affected by changepoints between 1975 and 1978, with Central America and the Caribbean and Western Europe particularly hard-hit by the increase in terrorism during this time period. In the case of Western Europe, as discussed in Section 2, the change seen in February 1975 appears to be due to a change of British government policy in Northern Ireland. One possible policy (which aligns exactly with the changepoint) is the truce between the Provisional IRA and the British Army, which led to an increase in sectarian violence and subsequent retaliations; see, for example, Craig (2014) and White (2010) . We remark that the dense changepoint in 1998 also aligns closely to the Good Friday Agreement. Meanwhile, the changes in Central America during the same time period seemingly can be explained by the appearance of revolutionary groups in the late 1970s. The activities of these groups waned drastically as countries in the region democratised into the 1990s, a process often referred to in the Central American context as Democratic Consolidation; see, for example, Berntzen (1993) .

Otherwise, as expected, most of the estimated changes are sparse. This corresponds to the commentary found in, for example, LaFree et al. (2014) , which asserts that most causes of terrorism remain localised. For instance, the change in the Middle East and North Africa in early 2013 appears to correspond to the beginning of the so-called 'Arab Winter' (King, 2020; Mihaylov, 2017) , a term which is used to describe the still ongoing re-emergence of significant levels of authoritarianism and extremism following the Arab Spring. Other changes of interest in the series include those placed in September 1978 and March 2005. We note that the former date aligns with the signing of the Camp David Accords. While the Accords did much to stabilise relations between Israel and Egypt, tensions were also engendered between Egypt and other Arab nations, as well as between the Egyptian government and their people (Bani-Salamah et al., 2012; Quandt, 1986 Quandt, , 1988 . Indeed, President Anwar Sadat was himself assassinated in 1981. The latter date detected by SUBSET is around one year before the conclusion of the insurgency in Iraq which degenerated into sectarian violence in February 2006.

Our analysis of the Global Terrorism Database fundamentally shows that the assertion that terrorist activity remains localised is broadly correct. By distinguishing between sparse and dense changepoints, SUBSET is able to confirm that most "dense" changepoints in the context of global terror levels in fact match up to changes in data collection procedures, as well as the fluidity in the definition of terrorism itself. By contrast, the sparse changes found by SUBSET appear to align with political events of particular significance in regions of greater tension.

A computationally efficient, high-dimensional multiple changepoint procedure with application to global terrorism incidence

In this section, we establish several lemmas required to prove the central results of the main article (please see Section B for the main proofs). Our general approach with this section is to establish the stated results in either the sparse or the dense setting, and then combine these results appropriately in Section B. Throughout, we repeatedly use the following two lemmas:

Lemma A.1 (Laurent and Massart (2000) ). Suppose G ∼ χ 2 k . Then for any x > 0 (2001)). Suppose H ∼ χ 2 k (ν). Then for any y > 0

and P H ≥ k + ν + 2 (k + 2ν)y + 2y ≤ exp(−y).

We now give two results on the Type I error of the SUBSET procedure. Lemma A.3 gives a bound on the Type I error in the sparse setting, under particular choices for the penalties α and β, and Lemma A.4 gives an equivalent result in the dense setting, under an additional choice for the dense penalty K.

Lemma A.3. Suppose we are in the same setting as for Theorem 4.1 of the main article. Let D i,t , α and β be as defined in Section 4 of the main article. Define

and

Lemma A.4. Suppose again that we are in the same setting as for Theorem 4.1 of the main article. Let D i,t and K be defined as in Section 4 of the main article. Define S 2,t = d i=1 D i,t − K. Setting β = (J + ) log n, α = 2 log d and K = d + √ 2βd + β gives that

. We define for convenience q α = 1 − p α =

We seek the Cramer transform, ψ * Aτ (r), of A τ , such that

for λ < 1 2 the supremum is achieved close to λ = 1 2pα − 1 pα dqα 2r , particularly for large d, and we can bound the Cramer transform by the value of the argument at λ . Thus we have (4) and (5).

can be shown using Jensen's Inequality), to assert that q α ≤ e −α/2 (1 + α) −1/2 , and that therefore

performing a Bonferroni correction for the position of τ in the data then gives the stated result.

Proof of Lemma A.4: In the scenario where there is no true change, the difference in cost between selecting the point τ as a change (with affected subset S = {1, . . . , d}) and simply finding the (correct) null model is Diff = RSS (y 1:n ; ∅) − RSS (y 1:n ; τ ; S) − K

where here we use the notation RSS(z; ξ; T ) to denote the residual sum of squares of the vector z enforcing a changepoint at time ξ with affected set T . Note that W ∼ χ 2 d . By Lemma A.1, to establish the result we require K and x such that

giving x = (J + ) /2 log n = β/2, and consequently K = d + √ 2βd + β as required.

We later use these lemmas to establish Theorem 4.1. Before this, we give further results which are needed in establishing the other result of the main article.

Lemma A.5. Assume that we are in the same setting as for Lemma A.3, except now we have that µ i,1 = µ i,2 whenever i ∈ S ⊆ {1, . . . , d}. For i ∈ S, let ∆ i := |µ i,2 − µ i,1 |. Then for δ > 0 and a = max {n, d} a sparse changepoint will be detected by SUBSET with probability greater than 1 − (a) −δ , providing that

where here we have θ = τ n is fixed strictly between 0 and 1. Lemma A.6. Assume that we are in the same setting as for Lemma A.5, except with the dense penalty regime. Then, again with probability greater than 1−a −δ , for 2 > δ > 0 and a = max{n, d}, providing that

a changepoint will be detected in the dense setting.

Proof of Lemma A.5: Suppose there is a true change at location τ which affects a non-empty, sparse subset S ⊂ {1, . . . , d} of variates, such that the magnitude of change in variate i is ∆ i . We compare the cost of fitting no change in such a scenario against the cost of fitting the truth; i.e. let

where D i,τ is as defined in the main article. Note that

Therefore, by Lemma A.2, letting γ = nθ(1 − θ) i∈S (∆ i ) 2 P Diff + β + |S|α ≥ |S| + γ − 2 (|S| + 2γ) y > 1 − exp(−y).

Note that if Diff > 0 then a changepoint will be detected. Therefore we require that γ ≥ 4y + β + |S|(α − 1) + 4y ((2α − 1)|S| + 2β + 4y).

We may set y = δ log a, for a = max{n, d}, to give that P (Diff > 0) > 1 − a −δ , providing that i∈S (∆ i ) 2 ≥ 4δ log a + β + |S|(α − 1) + 2 δ log a ((2α − 1)|S| + 2β + 4δ log a) nθ (1 − θ) , as required.

Proof of Lemma A.6: When comparing a fit at the true location τ = θn under a total penalty of K to the null fit, the difference in cost (in favour of the the non-null fit) is distributed as a non-central chi-squared distribution with d degrees of freedom and non-centrality parameter

By Lemma A.2 and the definition of K, we therefore see that setting ν −

(6) as x = β/2, and setting y = δ log a, (6) becomes

as required.

With these lemmas, we are now in a position to prove the results of the main article.

In this section, we combine the preliminary results of Section A to give proofs of the results stated in the main article.

Proof of Theorem 4.1: From Lemma A.3, letting g (n, d) = d and C = √ J + , some > 0,

gives that α = 2 log d and √ β = 2d

where here

As we have Γ(s,x) Γ(s) ≤ e −x 1 + x s s−1 for 0 < s < 1, we have that

1 2 so that, for example, by taking d = 2, we may bound exp (1+ϑ) 2 4ϑ 1 √ 1+2 log d above by an absolute constant ∀d ≥ 2. For max t S 2,t , we use Lemma A.4, and the result for SUBSET in both settings follows as S t = max t {S 1,t , S 2,t }.

Proof of Theorem 4.2: In the sparse setting, we may directly apply Lemma A.5, while in the dense setting, we may directly apply Lemma A.6. Note that the condition in Lemma A.5 resolves to give the required statement by setting K S = β + |S|α.

Algorithm 1 gives the pseudo-code of the SUBSET procedure without the post-processing step.

As discussed in Section 4.4 of the main article, a post-processing step is required in the SUBSET procedure to ensure that masking between different changepoints present in the data do not cause misspecification in the estimates of the affected sets at each changepoint. We detail this postprocessing procedure in Algorithm 2.

Data: A multivariate dataset, y 1:n ; an α and C (.) as for Algorithm 1 of the main article; a set of candidates returned by Algorithm 1 of the main article, 0 = ξ 0 < ξ 1 < . . . < ξ q < ξ q+1 = n. Result: An estimated set of changepointsτ 1 , . . . ,τm and corresponding estimated affected setsŜ 1 , . . . ,Ŝm.

Step 0: SetŜ 1 = . . . =Ŝ q = ∅,τ = N U LL; for i ∈ {1, . . . , d} do F = (−α, 0, . . . , 0); for j ∈ {1, . . . , q + 1} do

Algorithm 2: Post-processing step for the SUBSET procedure.

Note that this procedure, which closely parallels the Optimal Partitioning of Jackson et al. (2005) , Data: A multivariate dataset, (y i,j ) i=1,...,d,j=1,...,n ; variate penalty, α; changepoint penalty function, β; threshold penalty function, K; segment cost function, C (.); an interval number, M ; a sort function with respect to vector v, ρ v (.). Result: An estimated set of changepointsτ 1 , . . . ,τm and corresponding estimated affected setŝ S 1 , . . . ,Ŝm.

Step 0: Set l 0 = 1, u 0 = n,τ = N U LL,Ŝ = N U LL. for j ∈ {1, . . . , M } do r j ∼ U {1, . . . , n}, s j ∼ U {1, . . . , n}, (l j , u j ) = (min{r j , s j }, max{r j , s j }) end

Step 1: for j ∈ {0, . . . , M } do if u j − l j > 1, l j ≥ l 0 , u j ≤ u 0 then for t ∈ {l j , . . . , u j } do S 1,t = d i=1 max C y i,lj :uj − C y i,lj :t − C y i,(t+1):uj − α, 0 S 2,t = d i=1 C y i,lj :uj − C y i,lj :t − C y i,(t+1):uj S t = max {S 1,t − β, S 2,t − K} end if max t S t > 0 then q j = arg max S t , T q j = max S t if S q j = S 1,q j − β then T q j = i : C y i,lj :uj − C y i,lj :q j − C y i,(q j +1):uj − α > 0 else T q j = {1, . . . , d} end else q j , T q j , T q j = (N U LL, 0, ∅) end else q j , T j q , T q j = (N U LL, 0, ∅) end end

Step 2: Set q = q 0 , q 1 , . . . , q M if ||q|| 0 ≥ 1 then γ = arg max j∈{1,...,M +1} T q j , η = q γ , U = T η ,τ = (τ , η),Ŝ = Ŝ , U τ = ρτ (τ ),Ŝ = ρτ Ŝ Return to Step 1 with l 0 = l 0 and u 0 = η, and return to Step 1 with l 0 = η + 1 and u 0 = u 0 . else If there are no further 'active' (l 0 , u 0 ) intervals, then returnτ ,Ŝ. end Algorithm 1: SUBSET (without post-processing).

has complexity of O q 2 d , where q is the number of candidate changepoint locations returned by SUBSET. Indeed, employing a pruning step as per the PELT procedure of Killick et al. (2012) results in an expected cost of O (qd). As shown in Tickle et al. (2020) , this can be improved further to a worst-case cost of O (qd) using parallelisation. Therefore, the worst-case computational complexity of the post-processing step is O (nd).

Given that the SUBSET procedure employs a hybrid of a Wild Binary Segmentation approach, simulating M intervals at each stage, the worst-case computational cost of SUBSET is not dominated by the post-processing step, and is O (M dn log n).

After running SUBSET through the time series, the estimated changepoints and the corresponding affected sets of regions were computed. These are shown in Table 4 . For an alternative visualisation, with the estimated changes for each region superimposed over the raw count data, see Figure 7 . By comparison, Figure 8 shows the results of applying a univariate method (in this case the minimisation of the same penalised univariate negative binomial cost function using dynamic programming) to each series individually.

Several salient features of the dataset are revealed by this analysis. Firstly, we note that there are many similarities between the changes found by the univariate method and SUBSET: for several of the series (for example, Western Europe), the same number of changepoints are found, with broadly the same change locations. However, in general, we see that SUBSET is more parsimonious. In addition, by its nature, changepoints which occur in different series at the same time are more readily identified by SUBSET. For example, the most dense changepoint (following post-processing) located using SUBSET is that of January 1998, a month corresponding to a change in the data collection methods for the GTD for the "GTD2" phase.

Other points of interest found by SUBSET include several "staggered" changepoints at the end of the 1980s and the beginning of the 1990s, which appear to correspond to the end of the Cold War. These findings are intuitive in the sense that many countries formerly in the Soviet sphere began publishing incidents in media outlets available to the investigators compiling the GTD. More recent changepoints seem to align to significant events in, for example, the Arab Spring uprising and the conflict in Ukraine. 

We now seek to verify the robustness of our findings after applying SUBSET to the Global Terrorism Database. The most fundamental assumption made by the SUBSET procedure in undertaking this analysis is that each series corresponding to a given global region has broadly stationary residuals, with no or low correlation with other regions, up to the location of the changepoints.

Using the changepoint model returned to us by SUBSET in Table 4 , we calculate the Pearson residuals for each geographical region. For each pair or regions we then calculated the correlation between these residuals, with the results shown in Figure 9 . The mean correlation between a pair of regions is 0.063 to three decimal places. 

Detecting multiple generalized change-points by isolating single ones

The Camp David Accords: Lessons and Facts

maps: Draw Geographical Maps

Standard testing procedure for quantifying breathing gas carbon dioxide partial pressure for extravehicular activity and launch, entry, survival pressure suits

Democratic consolidation in Central America: A qualitative comparative approach

An alternative point of view on lepski's method

Nonparametric change-point estimation

A data-driven approach for estimating the changepoints and impact of major events on disease risk

Change-point detection in panel data via double cusum statistic

Multiple-change-point detection for high dimensional time series via sparsified binary segmentation

Monitoring the peace?: Northern Ireland's 1975 Ceasefire Incident Centres and the Politicisation of Sinn Féin

Logconcdead: An r package for maximum likelihood estimation of a multivariate log-concave density

Inferring change points in the spread of COVID-19 reveals the effectiveness of interventions

Terrorism activity, investor sentiment, and stock returns

A MOSUM procedure for the estimation of multiple random change points

High-dimensional change-point detection under sparse alternatives

Phase I monitoring of social network with baseline periods using Poisson regression

Wild binary segmentation for multiple change-point detection

Multivariate methods for monitoring structural change

High-dimensional changepoint detection via a geometrically inspired mapping

BayesProject: Fast computation of a projection direction for multivariate changepoint detection

Inference about the change-point from cumulative sum tests

Use of cumulative sums of squares for retrospective detection of changes of variance

An algorithm for optimal partitioning of data on an interval

The global terrorism database (gtd) [data file]. National Consortium for the Study of Terrorism and Responses to Terrorism (START), University of Maryland

Fast nonconvex deconvolution of calcium imaging data

Optimal detection of changepoints with a linear computational cost

The Arab Winter: Democratic Consolidation, Civil War, and Radical Islamists

The Global Terrorism Database (GTD): Accomplishments and challenges

Introducing the Global Terrorism Database

Putting Terrorism in Context: Lessons from the global terrorism database

Adaptive estimation of a quadratic functional by model selection

Impacts of Terrorism in the United Kingdom and Europe on Tourism in the United Kingdom

Why the Arab Spring turned into Arab Winter: Understanding the Middle East Crises through Culture, Religion, and Literature

Changepoint detection for acoustic sensing signals

Terrorism and its impact on the cost of debt

Camp David and peacemaking in the Middle East

The Middle East: Ten Years After Camp David . Brookings Institution

Hidden Markov Models for the activity profile of terrorist groups

Change point detection in social networks using a multivariate exponentially weighted moving average chart

Terrorist attack and target diversity: Changepoints and their drivers

Efficient estimation of parameters of the negative binomial distribution

Consistent and powerful graph-based change-point test for high-dimensional data

Changepoint detection for data intensive settings

Parallelization of a common changepoint detection method

Univariate mean change point detection: Penalization, cusum and optimality

InspectChangepoint: High-Dimensional Changepoint Estimation via Sparse Projection

High dimensional change point estimation via sparse projection

Some principles for surveillance adopted for multivariate processes with a common change point

The 1975 British-Provisional IRA truce in perspective

Dates Regions

This paper is based on work completed while Tickle was part of the EPSRC funded STOR-i Centre for Doctoral Training (EP/L015692/1). Eckley and Fearnhead gratefully acknowledge the financial support of EPSRC grant EP/N031938/1. The authors also acknowledge British Telecommunications plc (BT) for financial support, and are grateful to Kjeld Jensen, Dave Yearling and Guillem Rigaill for helpful discussions. Finally, the authors would like to thank the editors and two anonymous reviewers for several helpful comments and feedback.