key: cord-0669227-l3idgdpe authors: Ren, Benny; Barnett, Ian title: Autoregressive Mixture Models for Serial Correlation Clustering of Time Series Data date: 2020-06-30 journal: nan DOI: nan sha: ac83e4129f4920d874f91d28e07bb3122c33ed18 doc_id: 669227 cord_uid: l3idgdpe Clustering time series into similar groups can improve models by combining information across like time series. While there is a well developed body of literature for clustering of time series, these approaches tend to generate clusters independently of model training which can lead to poor model fit. We propose a novel distributed approach that simultaneously clusters and fits autoregression models for groups of similar individuals. We apply a Wishart mixture model so as to cluster individuals while modeling the corresponding autocovariance matrices at the same time. The fitted Wishart scale matrices map to cluster-level autoregressive coefficients through the Yule-Walker equations, fitting robust parsimonious autoregressive mixture models. This approach is able to discern differences in underlying autocorrelation variation of time series in settings with large heterogeneous datasets. We prove consistency of our cluster membership estimator, asymptotic distributions of coefficients and compare our approach against competing methods through simulation as well as by fitting a COVID-19 forecast model. Modern technologies have accelerated the pace at which data is collected, leading to new frontiers in quantitative research. For example, there is a proliferation of wearable devices and smartphones with sensors that continuously capture large multi-modal time series data at an individual level. Clustering, an important concept in data mining, often elucidates latent characteristics of the study population (Fokianos and Promponas, 2012) . However, these large and often sensitive datasets are well suited for distributed analysis with considerations for privacy (Allard et al., 2016; Hong et al., 2013; Yang et al., 2019; Jordan et al., 2018) . This problem can be addressed by combining low dimensional representations of similar individuals within a sample of heterogenous time series data. Time series clustering is a well studied topic, see Liao (2005) and Maharaj et al. (2019) for a comprehensive review. Time series clustering can be divided into two classes: hard or crisp clustering, where individuals are assigned to a single group and soft or fuzzy clustering, where individuals are assigned to multiple groups with membership weights. Hard clustering commonly follows the procedure of deriving a distance between each pair of time series and performing a hierarchical cluster analysis (Montero et al., 2014) . Alternatives to hierarchical cluster analysis include using community network detection and quasi U-statistics (Ferreira and Zhao, 2016; Valk and Pinheiro, 2012) . Soft clustering includes centroid based techniques and mixture models and can identify useful mixed memberships which is often preferred over hard clustering in situations with heterogeneous data (D'Urso and Maharaj, 2009; Genolini et al., 2015; Wang et al., 2015) . Time series clustering can concurrently be divided into three types: observation, model and feature based methods. Observation based clustering uses comparison between complete time series such as the many variations of dynamic time warping (DTW) , which may be difficult in a large data setting (Berndt and Clifford, 1994; Paparrizos and Gravano, 2015; Cuturi and Blondel, 2017) . Model based clustering assumes underlying models to derive similarities between time series (Coke and Tsao, 2010; Piccolo, 1990; Wang and Tsay, 2019; Xiong and Yeung, 2002; Gao et al., 2020) . Model based clustering is powerful but requires the correct model specification and can be computationally expensive in large datasets. Feature based clustering uses statistics representing each individual time series for clustering, such as the autocorrelation approaches of Galeano and Peña (2001) ; and D'Urso and Maharaj (2009) . Another popular class feature based methods revolve around spectral densities (Euán et al., 2018b; Chen et al., 2020) . Feature based methods work well with large datasets, by efficiently sharing relevant statistics across individuals, thereby bypassing computations that require all data points. However, an important discussion in time series clustering revolves around the role of heterogeneity or noise in the study population, often exhibited through different sample sizes and variances across individuals. Normalization and preprocessing of features are often used in clustering but sensitive in sparse data settings and heterogeneous characteristics should be incorporated into clustering algorithms. Of note, existing methods perform model fitting independently from clustering which is a lost opportunity and can lead to poorer model fit. We propose a new method that uses a Wishart mixture model (WMM) to address this problem and improve model fit by simultaneously modeling autocovariances along with clustering. The Wishart distribution has many applications in stochastic processes and is closely related to the Gaussian distribution (Gouriéroux et al., 2009; Wilson and Ghahramani, 2010) . Under mild conditions, stationary time series also have arbitrarily close causal autoregressive (AR) approximations, (see Corollary 4.4.2 of Brockwell and Davis (1991) for more details) lending themselves to techniques that are based in Gaussianity (Gupta et al., 2013; Broersen, 2000) . The Wishart distribution conveniently evaluates scatter matrices (an alternative form of the autocovariance matrix) by their proportionality which alleviates the need to normalize data and incorporates sample size as the degrees of freedom parameter. Individuals in a population may exhibit similar autocovariances in their stationary distribution or weak (second order) stationarity conditions, which can be exploited for clustering. The Wishart mixture model, defined in Hidot and Saint-Jean (2010) , can be used to cluster individuals by their scatter matrices, while simultaneously estimating group scale matrices, making WMM clustering applicable to a wide range of stationary time series. There's a natural connection between Wishart distributions and AR processes which are well behaved Gaussian processes under certain assumptions. Yule-Walker (YW) coefficients are also conveniently derived from the autocovariance matrix, or any proportional matrices such as the scatter matrix (Yule, 1927; Walker, 1931) . YW estimation is also consistent for causal AR processes (Brockwell and Davis, 1991) and as a result the WMM works well with YW estimation because it clusters matrices by their proportionality while accounting for heterogeneity in the degrees of freedom and variance of the innovations during clustering. We can then use the respective estimated group scale matrices to consistently estimate group specific AR coefficients using the YW equations. This derivation of AR estimates has the advantage of using pooled information across individuals, resulting in a robust and computationally inexpensive estimating procedure, making our method well suited to large datasets. As method of moment estimators, when YW is combined with the mixture approach of the WMM we obtain a mixture of marginal models with familiar asymptotic distributions (Liang and Zeger, 1986; Hansen, 1982; Rosen et al., 2000) . At its core, our WMM algorithm is a feature based method to cluster stationary time series by their second-order moments, but under the correct modeling assumption, has the additional benefit estimating an autoregressive mixture model (ARMM). In Section 2, we derive the WMM model under a variety of parametric assumptions and we detail an EM algorithm to estimate model parameters in each case. Next, we estimate the ARMM using the results from the WMM and outline a model selection procedure as well as detail competing methods. In Section 3, we compare the WMM with competing approaches through simulation studies and an application with COVID-19 case rate data. Ultimately we find WMM to be a powerful approach for clustering of time series data. Our data consist of I independent individuals, with time series from a zero-mean causal AR process, heterogeneous variances to be incorporated into clustering. We use the Wishart distribution to evaluate S i = n i t=1 y (t) i y (t) i T , a square K dimension non-singular scatter matrix. Replacing the elements of S i using its autocovariance MLE results in a Toeplitz, bi-symmetric structure such that [S i ] rc = n iγi (|r − c|) where r and c indicate the indices of matrix S i . We only need to calculate K autocovariance statistics for each individual. In general, our method relies only on second order conditions, matching a time series with matrix ζ, making it applicable to a broad array of stationary processes. The corresponding group AR model our algorithm identifies is given by where φ (k) g and ζ g are the parameters shared by the cluster/group, g denotes the group index, and k is the lag index. g , . . . , φ (K−1) g T are mapped to matrix ζ g and σ 2 i allows for groups to be defined by combinations of different coefficients and innovation variances. Using the YW equations, coefficients can be estimated using any matrix proportional to autocovariance and by extension, anything proportional to ζ g . Because YW and AR models are well studied topics, there are many tools at our disposal to complement our analysis. For example, assuming a causal and stationary AR process, from Theorem 8.1.1 in Brockwell and Davis (1991) , we have: 1 In addition, Qiu et al. (2013) and Shao and Yang (2011) proposed detrending procedures which retains the asymptotic properties of YW. Such a procedure can be applied to each individual time series before fitting our WMM. First we define the Wishart mixture model for G total number of groups or clusters, where g ∈ {1, 2, . . . , G}, the missing group indicators Z ig follows a multinomial distribution with π 1 , π 2 , . . . , π G as the mixing probability and S i follows a Wishart distribution with Σ g = κ g ζ g as the group scale matrix, κ g is a positive scalar and n i degrees of freedom. The Wishart density is characterized by evaluating Σ −1 g S i through h(S i | Σ g , n i ), which relates matrices by their proportionality. For example, if 1 n i S i P → σ 2 i ζ g , we then have Σ −1 g S i ≈ n i κ −1 g σ 2 i I and the Wishart density is asymptotically driven by n i . The complete data likelihood for Θ = {π 1 , . . . , π g , Σ 1 , . . . , Σ g } is given as Using the EM algorithm, we estimate π g , Σ g , and impute z ig (Dempster et al., 1977; Hidot and Saint-Jean, 2010) . In the estimation step, the function Q Θ,Θ (t) , with current estimateΘ (t) , is given as After conditioning and noting thatẑ and c(S i | n i ) is constant across all densities. Here n i and σ 2 i controls the extent of influence an individual's noise has on clustering by accessing membership through h(S i | Σ g , n i ) at different g's. A small n i , often associated with a noisy estimate of S i , results in a flat density function, leading to mixed soft clustering assignments through equation (3). This allows the data from noisy individuals to be dispersed throughout the each group's estimation rather than assumed by any individual group, contrary to hard clustering techniques. In the maximization step, maximizing Q Θ, yields our update for the mixing probabilityπ The score function for Σ g is which yields our update asΣ The estimation ofΣ g , which maps to the group AR coefficients, is done simultaneously along with clustering, allowing all individuals to be leveraged for the estimation of each group model under the soft clustering assignments. As 1 The κ g are the group mean of variances and if variance is constant within the group, then σ 2 i = κ g and Σ −1 g S i ≈ n i I. This is the ideal scenario for clustering, where the Wishart density is only a function of n i . But if variance is not constant, then Σ −1 where the ratio of the individual and group mean variance is used to adjust the Wishart density. Together, proportionality, σ 2 i and n i controls the peakedness of the Wishart density and determine each individual's group membership. For example, a high value for σ 2 i makes achieving proportionality difficult, requiring a large n i . Repeating the EM algorithm, equations, (3), (4), and (5) until convergence leads to the estimate of Θ. TheΣ g are proportional to ζ g , making it a valid statistic for YW estimation. The final group indicator integer is imputed as the index that maximizesẑ ig , also known as the maximum a posteriori (MAP) rule for values z ig such that z ig * = 1 where The y for this correlation by using the effective degrees of freedom. Naturally, we may choose n i as the effective degrees of freedom as it aligns with moment matching (Pivaro et al., 2017) . Alternatively, we propose to account for correlation within the scatter matrix by scaling the sample size by a positive factor, λn i (Afyouni et al., 2019; Quenouille, 1947; Bartlett, 1946 ). In addition, computing Wishart densities can be numerically unstable when n i is large as the distribution function becomes very peaked, making convergence highly sensitive to initial parameter values. We address numerical instability and correlation with a modified version of the above proposed EM algorithm where we estimate λ. We propose an extension of the EM algorithm which adjusts degrees of freedom at a cluster level by the scaling with a λ g group adjustment term to n i , where Λ = {λ 1 , λ 2 , . . . , λ G }. The update of λ g is calculated by solving score function We update equation (3) usinĝ and (5) becomesΣ We numerically update λ g usinĝ such that λ g ∈ K−1 min(n i ) , U and an upper bound prevents numerically instability incurred due to having a large degree of freedom. λ g can be solved using a constrained optimization algorithm such as the constrained Broyden-Fletcher-Goldfarb-Shanno algorithm (Byrd et al., 1995) . For this additional algorithm, repeat, in order, (6), (4), (7), and (8) until convergence. The above versions of the proposed EM algorithm allow for different assumptions on the Wishart distribution, namely with respect to how the degrees of the freedom parameter are handled, and as such we treat both approaches as competing methods. The YW equations can provide an alternative representation of the WMM model as a mixture of marginal models (Rosen et al., 2000) . The group estimate,Σ g ≈ κ g ζ g is an proportional estimator of the autocovariance matrix. We can useΣ g in the YW estimator for AR coefficients as it relies on a matrix that is proportional to the autocovariance matrix. The matrix can be blocked asΣ where q g is scalar, u g is a K − 1 vector, and Q g is a K − 1 square matrix. The YW system for estimating AR coefficients is given as the method of moments estimatorΦ g = The denominator, I i=1 n iẑig , ofΣ g cancel when cal-culatingΦ g resulting in a weighted sum of moments from different individuals, akin to a weighted generalized estimating equation (GEE) or generalized method of moments approach (Liang and Zeger, 1986; Hansen, 1982) . The YW estimation results in group AR models given in equation (1). TheΣ g maps to the autoregressive parameters,Φ g and is estimated using data across individuals, creating a robust group model. Incorporating individual i's group indicator, z ig from the WMM, we arrive at the final ARMM The z ig govern AR model membership for individual i; φ In the first phase, four example time series are mapped to scatter matrices, S i , and are processed by the WMM. In the second phase,Σ g is used to estimate the AR coefficients Φ g via Yule-Walker estimation. In the third phase, group indicators z ig are combined with the group AR models to create the ARMM. Selecting the AR lag a priori is an open question; it can be based off of domain knowledge or autocorrelation plots. After estimation, we can also use a penalization criteria such as AIC or BIC to select the lag (Akaike, 1974; Schwarz et al., 1978; Hurvich and Tsai, 1989 ). However, we recommend selecting a conservative number of lags, greater than the true value of K, because the additional coefficient estimates tend to be close to zero and may also be evaluated using its asymptotic distribution. Fitting the WMM improves with number of lags K, as our algorithm only depends on the estimation of the stationary autocovariance matrix. First, we may select G based on a selection criterion then address selection of K for each group (see Chapter 6.5 of Madsen (2007) The number of groups G can selected based on AIC or BIC. Using the MAP estimate of z ig , the AIC and BIC contribution for an individual is, n i logσ 2 i , where we use YW es- This estimate also has the benefit of using previously calculated statistics and does not require any new computations. The AIC and BIC for number of groups are where r = G * K − 1. In our analyses, we consider alternative clustering methods suitable under a large data setting. We explore cluster analysis methods based on retrieving individual level statistics relevant to AR and stationary processes. We use cluster analyses that calculate pairwise distances between all individuals for hierarchical clustering, which are outlined in the several relevant distances. First we consider the distances defined by Galeano and Peña (2001) between two time series derived using autocorrelations We also define d P ACF (y i , y j ), as d ACF (y i , y j ), replacing the autocorrelation with partial autocorrelation as our second distance measure. Finally, we use Piccolo distance defined are AR coefficients, for our third distance measure (Piccolo, 1990) . We also propose a soft clustering analog to the Piccolo distance where the AR coefficients are averaged across individuals to obtain the model average. We use a Gaussian mixture model (GMM) and treat individual i's AR coefficients (maximum likelihood estimates), i , . . . ,φ (K−1) i T as observations of a multivariate normal distribution. The GMM complete data likelihood is given as L(π 1 , . . . , π G , Φ 1 , . . . , Φ G , Ω 1 , . . . , . We calculateπ g ,Φ g , andẑ ig using the GMM, rather than the WMM, using R package mclust (Fraley et al., 2014) . Our WMM methods already serves as a soft clustering analog for autocorrelation distance. Finally, we also evaluate the Hierarchical Spectral Merger (HSM) algorithm, R package HSMClust, another method for stationary time series (Euán et al., 2018a) . The HSM is based on Total Variation distance of the normalized spectral densities where normalization is given as f (ω) =f (ω)/γ(0). Spectral densities are closely related to the causal form of ARMA processes making the HSM and our WMM closely related in their mathematical reasoning. The asymptotic behavior of YW estimation for an individual time series is well studied. Writing the time series in matrix form, y i = X i Φ i + ε where X i is the proper arrangement of y i , we have the following consistency results. Theorem 3.1. (Theorem 8.1.1 from Brockwell and Davis (1991) ) Assuming y i is a causal and stationary AR(K − 1) process, as n i → ∞, thenσ 2 where Ω i is the true K − 1 dimension autocovariance matrix. It is well known that EM converges to the local maximum; typically multiple runs from random initializations are executed to determine the MLE. We have found that, under certain conditions, the WMM is a consistent estimator as n i increases i.e. as more observations per individual are obtained: Theorem 3.2. Assuming y i is a causal and stationary AR(K − 1) process, the number of groups G is correctly specified such that σ 2 i = κ g 1 n i S i P → κ g ζ g . As n i → ∞ for all i, then Theẑ ig estimates consist of the sum of ratios of two Wishart kernels where group memberships are driven by the separation of ζ j and ζ g . However, when we allow for a large number of groups G, individuals belonging to the same AR model will also be clustered by their innovation variance such that σ 2 i = κ g . In practice, we use a penalization criteria such as BIC to limit the number of groups, G. As a result, the WMM clustering is primarily driven by differences between AR coefficients, but can also take large differences in innovation variances into consideration. In certain situations, such as forecasting, it's advantageous to group individuals by both AR coefficients and uncertainty, in order to avoid mixing large and small variances into a group level forecast model. One can also disregard the innovation variances during clustering by using the a normalized scatter matrix [S i ] rc = n iγi (|r − c|)/γ i (0) = n iρi (|r − c|) based on autocorrelations. The autocorrelationsρ i (|r − c|) are also consistently estimated. As a result, we consistently estimate groups with the same AR coefficients but do not take into account heterogeneous innovation variances in the clustering algorithm. The WMM group matrices is a means of summing weighted moments from different time series, akin to Generalized Estimating Equations and mixtures of marginal models, leading to a straightforward asymptotic distribution involving a sandwich estimator (Rosen et al., 2000) : Theorem 3.3. The asymptotic distributions of Yule-Walker estimates,Φ g derived from the Wishart mixture model arê Often, the matrices are set to X T i X i rc = n iγi (|r−c|), replacing each element with its MLE and we used the YW estimate for variance evaluated at group g,σ 2 ig =γ i (0) 1 − u T g Q −1 g u g /q g . In addition,ẑ ig , continuous probability value defined in (3) are substituted into the asymptotic distribution. The proofs are left to Appendix 5 and 6. To evaluate the performance of competing methods we conducted two simulation studies. We denote the algorithm based on equations (3) Merger (HSM). All autocorrelation and AR coefficient based methods were performed using two lags and all competing methods were based on two pre-specified groups. We simulated I = 200 individuals with G = 2 groups and 100 individuals in each group. We simulated a number of cases under different ARMA parameterizations, different values of n i and σ 2 i . The ARMA(p, q) parameterization are given as i ∼ N(0, σ 2 i ). In our simulations, after we obtain z ig MAP estimates we compare them to the truth and calculate accuracy of recovering the true group memberships as the number of correctly identified individuals divided by the total number of individuals. We repeated the simulation 1000 times for each case, in order to calculate mean accuracy and its standard errors (SE). Note that for HSM, the HSMClust R package requires equal length time series and so could not be applied to Case 3 and 6. From our simulations, we found that the WMM method generally outperforms or is comparable to other methods while having lower standard errors. This is partly a result of being able to use cluster information during the model fitting due to their simultaneous fit as opposed to doing clustering sequentially after model fitting. The HSM clustering is sensitive given the small sample size but has accurate results as n i increases. ACF and WMM are comparable under the AR parameterization (Case 1-4), but are inferior to WMM under the moving average (MA) parameterization (Case 5-6). WMM are also leading methods when there are heterogeneous innovation variances (Case 4), indicating that AR coefficients drive clustering and a very large difference in innovation variance must be present in order to impact clustering. Under the MA parameterization, the WMM out performs all competing methods, while AR coefficient based methods suffer from model misspecification. WMM clustering does not require any ARMA modeling assumptions and works well for different types of stationary time series. Finally, WMM works well under imbalance sample sizes (Case 3,6). Time series with large n i are better able to capture the underlying process, and anchor the estimation of Σ g in the EM algorithm. Equation (5) estimates the Σ g as a grand mean of all scatter matrices, giving more weight to time series with large n i . Numerous studies have been proposed to forecast the spread of COVID using the autoregressive integrated moving average (ARIMA) model (Benvenuto et al., 2020; Ceylan, 2020; Alzahrani et al., 2020) . In order to look at stationary segment of the data, we study case counts from the second winter (October 1, 2020-February 28, 2021) of the COVID pandemic. Daily new COVID cases for 67 Pennsylvania (PA) counties was obtained from the The New York Times GitHub: https://github.com/nytimes/covid-19-data (The New York Times, 2021). Case rates were calculated by dividing daily new cases by county population and were also mean centered to create I = 67 time series with n i = 151. Counties of PA differ greatly by their population offset, i.e. innovation variance, making the WMM well suited to this task. We evaluate G = 1, · · · , 10 clusters using EM1 and BIC defined in (10) for selecting G. COVID reporting is known to be influenced by day of the week, therefore we elect to use K − 1 = 7 coefficients. We initialize our WMM using HSM clustering results and we found that BIC selected for 5 groups. In addition, we compare clustering of WMM and HSM algorithms as both are designed for stationary time series. to identify outliers first, such as sparse or densely populated counties, e.g. Philadelphia. When using HSM, Philadelphia was first separated into its own group, follow by the rural counties. Under HSM, the majority of the PA county belong to the same group and results are not meaningful with many clusters being occupied by singletons. This is because distance-based algorithms often separate outliers are the start, while mixture model based methods are more robust because they seek to initially estimate group parameters, which leads to more balanced groups. Under WMM, Group 1 (g = 1) is comprised of suburbs and populous counties, with the exception of Philadelphia. The reason for this is because Philadelphia uniquely does not report cases on the weekends, leading to weaker apparent autocorrelations and clustering Philadelphia instead with rural counties (g = 2). Sparsely populated counties tend to have low autocorrelations, while autocorrelation tends to be positively correlated with county population. Populous counties have regimented testing protocols leading to higher autocorrelation. Our method simultaneously clustered and estimated AR models for each group. Combined with the asymptotic distribution for evaluating our coefficients, our simplified procedure is a fast and intuitive method for studying heterogeneous time series data. Our AR models serve as a parsimonious forecast models that borrow information across different counties separated into meaningful clusters. We proposed a computationally efficient method to cluster stationary time series and estimate their group AR model. Our method incorporates different innovation variances and sample sizes in the estimation, making it suitable for heterogeneous datasets. Under mild conditions our AR models and group labels estimates are consistent and have asymptotic distributions which accounts for heterogeneous variances. From simulations, we found that our WMM approach outperforms most competing methods. Furthermore, our WMM approach improves as sample size increases even in datasets with imbalanced time series lengths. Our WMM and ARMM shows promise as a clustering and forecasting model for COVID cases in PA counties. We found that group assignments mostly align with county population, and PA COVID time series primarily consist of three main groups with different levels of autocorrelations and distinct AR models. As a future analysis, we may incorporate the detrending procedure of Qiu et al. (2013) in order to study non-stationary time series. In conclusion, our WMM method is well equipped to handle noisy and large datasets by efficiently combining clustering with model fitting in a mixture model framework. APPENDIX 5 Proof of Theorem 3.2 Since we evaluate the estimator as all n i → ∞, for ease of notation, assume n i = n ∀i and n → ∞. In addition, assume that y i has mean zero. By the asymptotic behavior of the Yule-Walker estimates for causal AR processes (Theorem 3.1), scatter matrices as a sequence of n, converges in probability to their correct group autocovariance matrix, suppose labels z ig are correct for group g,Σ g then Σ g = κ g ζ g = I i=1 z ig σ 2 i I i=1 z ig ζ g . Update for the group indicator,ẑ ig = Pr Z ig = 1|S i ,Θ , are given aŝ . The pdf of the Wishart distribution is given as: where M 1 is constant with respecct to Σ g , but tends to 0 as n gets large. However, M 1 is factored out and cancels, in the numerator and denominator, when evaluating z ig , (11). When the individual is in the correct group, then Σ −1 g S i = nI, A = n 2 Kn/2 , B = exp −nK 2 and AB = n 2e n/2 K . Evaluating the lower bound for C at k = 1, and the lower bound of ABC is given as Using the Laplace method or Stirling's formula for gamma function, , when S i /n P → κ g ζ g and we have When S i /n P → σ 2 i ζ g , the individual is in the incorrect group, then Σ −1 j S i = nκ −1 j σ 2 i ζ −1 j ζ g = nκ −1 j σ 2 i W jg , m k (W jg ) are the eigenvalues of W jg = ζ −1 j ζ g . The rest is given as A = The upper bound of ABC is now given as ABC < e n/2 n K/2 K where M 2 is a constant with respect to n. Finally, x exp(x−1) ≤ 1 and the equality only holds when x = 1, but in our case κ −1 j σ 2 i m k (W jg ) = 1 for every k because κ −1 j σ 2 i W jg = I. Therefore, 0 < r ig = K k=1 κ −1 j σ 2 i m k (W jg ) exp(κ −1 j σ 2 i m k (W jg )−1) < 1. The upper bound for ABC is given as O n M 2 K k=1 κ −1 j σ 2 i m k (W jg ) exp(κ −1 j σ 2 i m k (W jg )−1) n/2 , when S i /n P → σ 2 i ζ g . If S i /n P → σ 2 i ζ j we have 0 < K k=1 κ −1 j σ 2 i exp(κ −1 j σ 2 i −1) < 1, when σ 2 i = κ j , in other words clusters must also have the same AR coefficients and innovation variances. When the individual is in the correct group S i /n P → κ g ζ g , and σ 2 i = κ g , thenẑ ig = Pr Z ig = 1|S i ,Θ becomeŝ z ig ≥ π g O(n K/2 ) π g O(n K/2 ) + j =g π j O n M 2 r n/2 jg where 0 < r jg < 1. Then consistency ofẑ ig follows, lim n→∞ Pr (|ẑ ig − z ig | > ε) = 0. Under strict concavity, iterative convergence of the EM algorithm reaches the maximum, our label given in (11) converge in probability to the indicator function for the correct group. We now haveẑ ig P → z ig , S i /n P → κ g ζ g and  is a consistent AR coefficient estimator through Yule-Walker equations because it only relies on proportionality to ζ g . We haveΦ g = Q −1 g u g , andΦ g Effective degrees of freedom of the Pearson's correlation coefficient under autocorrelation A new look at the statistical model identification A new privacy-preserving solution for clustering massively distributed personal times-series Forecasting the spread of the COVID-19 pandemic in Saudi Arabia using ARIMA prediction model under current public health interventions On the theoretical specification and sampling properties of autocorrelated time-series Application of the ARIMA model on the COVID-2019 epidemic dataset Using dynamic time warping to find patterns in time series Properties of the singular, inverse and generalized inverse partitioned Wishart distributions Time series: theory and methods: theory and methods Facts and fiction in spectral analysis A limited memory algorithm for bound constrained optimization Estimation of COVID-19 prevalence in Italy Collective spectral density estimation and clustering for spatially-correlated data Random effects mixture models for clustering electrical load series Soft-DTW: a Differentiable Loss Function for Time-Series Maximum likelihood from incomplete data via the EM algorithm Autocorrelation-based fuzzy clustering of time series The hierarchical spectral merger algorithm: a new time series clustering procedure Spectral synchronicity in brain signals Time series clustering via community detection in networks Biological applications of time series frequency domain clustering mclust: Normal mixture modeling for model-based clustering, classification, and density estimation Multivariate analysis in vector time series Regularized matrix data clustering and its application to image analysis kml and kml3d: R packages to cluster longitudinal data The Wishart autoregressive process of multivariate stochastic volatility On the convergence of the spectrum of finite order approximations of stationary time series Large sample properties of generalized method of moments estimators An Expectation-Maximization algorithm for the Wishart mixture model: Application to movement clustering A survey on privacy preserving time series data mining Regression and time series model selection in small samples Communication-efficient distributed statistical inference Longitudinal data analysis using generalized linear models Clustering of time series data-a survey Time series analysis Time series clustering and classification TSclust: An R package for time series clustering k-shape: Efficient and accurate clustering of time series A distance measure for classifying ARIMA models On the exact and approximate eigenvalue distribution for sum of Wishart matrices Efficient inference for autoregressive coefficients in the presence of trends A large-sample test for the goodness of fit of autoregressive schemes Mixtures of marginal models Estimating the dimension of a model Autoregressive coefficient estimation in nonparametric analysis Coronavirus (Covid-19) Data in the United States Time-series clustering via quasi U-statistics On periodicity in series of related terms Large-Scale Time Series Clustering Based on Fuzzy Granulation and Collaboration Clustering Multiple Time Series with Structural Breaks Generalised Wishart Processes Mixtures of ARMA models for model-based time series clustering Federated machine learning: Concept and applications On a method of investigating periodicities disturbed series, with special reference to Wolfer's sunspot numbers