key: cord-0230324-w4gziv0j
authors: Movahedifar, Maryam; Dickhaus, Thorsten
title: Combining Multiple Testing with Multivariate Singular Spectrum Analysis
date: 2022-03-11
journal: nan
DOI: nan
sha: 7d6b0664012b406833f3eb7f6ee318e968a75189
doc_id: 230324
cord_uid: w4gziv0j

Appropriate preprocessing is a fundamental prerequisite for analyzing a noisy dataset. The purpose of this paper is to apply a nonparametric preprocessing method, called Singular Spectrum Analysis (SSA), to a variety of datasets which are subsequently analyzed by means of multiple statistical hypothesis tests. SSA is a nonparametric preprocessing method which has recently been utilized in the context of many life science problems. In the present work, SSA is compared with three other state-of-the-art preprocessing methods in terms of goodness of denoising and in terms of the statistical power of the subsequent multiple test. These other methods are either parametric or nonparametric. Our findings demonstrate that (multivariate) SSA can be taken into account as a promising method to reduce noise, to extract the main signal from noisy data, and to detect statistically significant signal components.

Most real-life datasets are contaminated by the presence of considerable noise. Thus, noise reduction and signal extraction have always been considered as important steps of data analysis in all fields of study; see, among many others, [1] [2] [3] . In this paper, different types of preprocessing methods, both parametric and nonparametric, are evaluated. In particular, we consider the Autoregressive Fractionally Integrated Moving Average (ARFIMA) model (see [4] ) as well as Exponential Smoothing (ETS, see [5] ) as parametric methods, and Singular Spectrum Analysis (SSA, cf. Section 2 for details) and feed-forward Neural Network (NN, see [5, 6] ) as nonparametric methods. The usage of nonparametric models is often considered when parametric model assumptions are prone to be violated (cf., for instance, [7, 8] ). One example is constituted by economic time series: Economic conditions (in most cases, especially after recessions) can lead to non-constant mean and variance, thus to a violation of parametric assumptions of stationarity.

The first purpose of the present paper is to compare the aforementioned signal preprocessing techniques in terms of their accuracy in signal extraction by means of simulated (time series) data where ground truth is known. The second purpose is to provide a comparison on real data. In the latter case, ground truth is not known, and we compare the methods in terms of a penalized (empirical) signal-to-noise ratio. Furthermore, we compare them in terms of the number of rejected null hypotheses of a variety of multiple tests (see [9, 10] for comprehensive introductions to multiple testing) in the case that the aforementioned preprocessing methods are applied to real data prior to carrying out the multiple test. In all our comparisons, (multivariate) SSA turns out to be a promising approach.

The rest of this paper is organized as follows. Section 2 provides a short theoretical background of basic SSA. Section 3 describes the other three considered preprocessing methods. In Section 4, we report results from a simulation study comparing the signal extraction accuracies of the considered preprocessing methods. Section 5 is devoted to a brief introduction to multiple testing. In Section 6, several real-life datasets are analyzed. Finally, Section 7 contains a discussion.

The origins of SSA can be traced back to the work of Broomhead and King (1986) ; see [11] . SSA, as a branch of time series analysis, is a nonparametric approach that is free from the necessity of statistical assumptions like stationary or linearity of the time series, which are unlikely to hold in the real world. The SSA method is also known for its ability to deal with short time series in which classical methods fail due to a lack of a sufficient number of observations. Furthermore, the SSA method is capable of filtering and forecasting time series. It can be carried out as a univariate or as a multivariate method.

In short, the SSA technique initially filters the time series to decrease the noise, and then reconstructs the less noisy series to forecast the time series. The major goal of the SSA method is to decompose the initial ordered series (e. g., a time series) into a sum of different components which can be considered as either a trend, a periodic, a quasi-periodic (perhaps, amplitude-modulated), or a noise element. We refer to [12] [13] [14] [15] [16] [17] for more details.

As mentioned before, SSA (henceforth used to refer to the univariate version) is currently widely used, with a considerable number of applications. However, compared to SSA, multivariate SSA (MSSA) is still in its initial stage and only a handful of applications of MSSA have been investigated so far; see, for example, [18] [19] [20] [21] . In the following subsections, concise descriptions of SSA and MSSA are presented. For more information, see [22] [23] [24] .

Let Y N = (y 1 , ..., y N ) denote a time series of length N. Fix an integer L, which is called window length, where 2 ≤ L ≤ N/2. Then, basic SSA consists of the following four steps: 1) Embedding: This step transfers the one-dimensional time series Y N = (y 1 , ..., y N ) of length N into the multi-dimensional series X (1) , . . . ,

The vectors X (i) are called L-lagged vectors (or, simply, lagged vectors). The embedding step includes only one parameter, namely, the window length L. The purpose of this step is to form the trajectory matrix X = X (1) , . . . , X (K) ∈ R L×K . 2) Singular Value Decomposition (SVD): In this step, the trajectory matrix X is decomposed into a sum of rank-one elementary matrices. Let λ 1 , . . . , λ L denote the eigenvalues of XX T , which are ordered in decreasing magnitude such that λ 1 ≥ · · · ≥ λ L ≥ 0, and let U 1 , . . . , U L denote the eigenvectors of the matrix XX T corresponding to these eigenvalues. Letting d = max{i : λ i > 0} = rank(X), the SVD of the trajectory matrix can be written as

3) Grouping: In this step, the set of indices {1, . . . , d} is partitioned into γ disjoint subsets I 1 , . . . , I γ . Let I = {i 1 , . . . , i p }. Then, the matrix X I corresponding to the group I is given by X I = X i1 + · · · + X ip . For example, if I = {2, 3, 5}, then X I = X 2 + X 3 + X 5 . Considering the SVD of X, the split of the set of indices {1, . . . , d} into the disjoint subsets I 1 , . . . , I γ can be represented as X = X I1 + · · · + X Iγ . The choice of the groups is explained, for instance, in Section 2.1.2.3 of [23] . 4) Diagonal Averaging: The aim of this step is to transform every obtained matrix X Ij from the grouping step into a Hankel matrix so that these can subsequently be converted back into a (reconstructed) time series. In basic SSA, Hankelization is achieved via diagonal averaging of the matrix elements over the anti-diagonals, see [22] .

In the case that multiple time series are observed, the joint structure or the dependency among the series are also of interest, in addition to the internal structure of each individual time series. This is incorporated in MSSA. The main idea of the MSSA method is similar to that of basic SSA, but the embedding step for constructing the trajectory matrix is different. The purpose of MSSA is to take into consideration the combined structure of a multivariate series to exploit information about the joint distribution of these series.

Here we consider the algorithm of MSSA for analyzing multivariate time series which is explained in Section 4.2 of [24] . A brief description of the MSSA method is as follows: Suppose a multivariate time series is given as a collection

Np j=1 : p = 1, . . . , s} of s univariate time series of length N p for each p ∈ {1, . . . , s}. The generic scheme of the algorithm described in Section 2.1 is also used in MSSA; it is just required to describe the embedding step for creating the trajectory matrix X MSSA . To this end, let the integer L again denote the window length, where 2 ≤ L < min(N p : p = 1, . . . , s). For each time series Y (p) , we form L-lagged vectors X j (p) = y p j , . . . , y p

Np j=1 : p = 1, . . . , s} is then a matrix of size L × K and has the form X MSSA = [X 1

(1) , . . . , X K1 (1) , . . . , X 1 (s) , . . . , X Ks (s) ]. For more details on MSSA and its properties we refer to Chapter 4 in [24] .

In this section, the three other considered (parametric and nonparametric) signal extraction methods are briefly explained.

The autoregressive fractionally integrated moving average (ARFIMA) model of order (p, d, q), denoted by ARFIMA(p, d, q), is useful in describing phenomena of long-memory processes. The ARFIMA model is also a generalization of the autoregressive integrated moving-average (ARIMA) model with integer degrees of integration. ARFIMA models are appropriate for overdifference stationary series that display long-run dependence. In the ARIMA process, a nonstationary time series is differenced d ∈ N times, with the goal of achieving stationarity of the differenced time series. These kinds of time series are said to be integrated of order d, denoted I(d). Stationary series correspond to not differencing, hence, they are I(0). Some time series exhibit too much autodependence to be I(0), but are not I(1), either. The ARFIMA model allows for a continuum of fractional differences d ∈ (−0.5, 0.5). The generalization to fractional differences allows the ARFIMA model to handle processes that are neither I(0) nor I(1), to test for overdifferencing, and to model long-run effects that only die out at long time horizons; see [25] .

Formally, the ARFIMA model with mean µ has the specification

where the ε t 's are stochastically independent and identically distributed (i.i.d.) with zero mean and variance σ 2

The real numbers Φ 1 , . . . , Φ p and ϕ 1 , . . . , ϕ q are model parameters. The ARFIMA model can be fitted by minimizing the Akaike Information Criterion (AIC). For more details, see [26] .

The Exponential Smoothing (ETS) method builds upon a state space model for the time series under consideration. For example, in Section 3 of [27] the following system of equations is considered:

Different choices of the vectors (x t ) t and the functions f , g, h, and k lead to different types of (parametric) models; see Table 1 in [27] . These vectors and functions are capable of expressing different time series components such as, e. g., trend and season. The term ETS itself refers to a weighting scheme that is used in forecasting future time series values, where the weights decrease exponentially with growing lag. It is beyond the scope of the present paper to describe the ETS method comprehensively, and we refer to [28] and Chapter 8 of [5] instead. For our applications, we have used the ETS model from the R package forecast; see https://cran.r-project.org/web/packages/forecast/forecast.pdf. In this implementation, model selection is performed by minimizing the AIC, and model parameters are estimated by using the maximum likelihood approach.

The Neural Network (NN) model that we have considered takes the form

where x t ∈ R p consists of p lagged values of y t . The functional form of ψ is typically sigmoidal. For instance, the logistic form of ψ is given by

Equations (1) and (2) define a feed-forward neural network with a single hidden layer and lagged time series inputs. The number k of logistic functions used in (2) is equal to the number of hidden nodes. Neural networks with a datadependent choice of k are occasionally referred to as nonparametric neural networks in the literature; see, e. g., [29] . We adopt this terminology here. In our applications, we have used the nonparametric NN implementation nnetar from the R package forecast; see also Section 12.4 in [5] for details.

The purpose of this section is a comparative evaluation of the preprocessing methods described in Sections 2 and 3 regarding their capability of denoising a dataset. For this purpose, we have generated times series (y t : t = 1, . . . , N = 100) according to a "signal plus noise" model of the form y t = f t + ε t , t = 1, . . . , N . For the signal part (f t ) t=1,...,N , the following four choices have been considered.

1. Sine+Exponential : f t = sin(2πt/12) + exp(0.01t) 2. Cosine+Linear: f t = 0.8 cos(πt/3) + 0.6t 3. Sine×Exponential: f t = sin(2πt/12) × exp(0.01t) 4 . Sine+Linear+Exponential: f t = sin(3πt/12) + 0.5t + exp(0.03t)

In all four cases, the (noise) time series (ε t : t = 1, . . . , N = 100) consisted of (centered) Gaussian white noise with unit error variance σ 2 = 1. Each of the above series has been simulated 1000 times. The denoising performance of SSA, as compared to each of the methods described in Section 3, has been assessed by calculating the relative root mean squared error (RRMSE) and the relative mean absolute error (RMAE) criteria, which are given by

wheref i is the estimated value of f i based on SSA andf i is the estimated value of f i obtained by applying a competing method. For each of the methods that we have discussed in Section 3, the values (f i ) 1≤i≤N have been obtained by using the R function fitted(). Essentially, this function first computes the residuals of the model by replacing unknown quantities by their estimates in the model equation. Then, these residuals are subtracted from the original data points. If RRMSE < 1 or RMAE < 1, respectively, then the SSA procedure outperforms the other method by (1−RRMSE)×100% or (1−RMAE)×100%, respectively. Table 1 reports the average RRMSE and RMAE values (averaged over the 1000 Monte Carlo repetitions of the simulation) attained under each of the four aforementioned simulation models. It becomes apparent from Table 1 that the SSA method has a higher accuracy in signal extraction than the other considered preprocessing methods, for all four considered data-generating models and for all considered choices of the tuning parameter L (window length).

In this section, we explain some fundamental concepts regarding multiple test problems and multiple test procedures (MTPs). In general, a multiple test problem occurs whenever m > 1 null hypotheses H 1 , . . . , H m shall be tested simultaneously based on one and the same dataset. Denoting the sample space of the statistical model under consideration by Y, a (non-randomized) multiple

Namely, H i is (by convention) rejected if and only if ϕ i (y) = 1, where y ∈ Y denotes the observed data.

Many multiple tests operate on test statistics T 1 , . . . , T m or corresponding p-values p 1 , . . . , p m , respectively. Utilizing p-values is useful in the multiple testing context, because every p-value takes its values in the unit interval [0, 1], even if the corresponding test statistics have different scales. Thus, transforming test statistics into p-values serves the purpose of a standardization. It is beyond the scope of the present paper to discuss the notion of p-values in depth, and we refer to Chapter 2 in [10] for details. For our purposes, it suffices to know that a (valid) p-value, regarded as random variable, is under the null hypothesis stochastically not smaller than the uniform distribution on [0, 1], and that p-values typically tend to small values under alternatives.

Assuming that exactly m 0 of the m null hypotheses are true and the remaining m 1 = m − m 0 ones are false (where m 0 and m 1 are unknown in practice), the decision structure of a given multiple test is given in Table 2 . The random variable V m counts the number of type I errors, and the random variable T m counts the number of type II errors. Unlike in one-dimensional test problems (i. e., in cases with m = 1), type I and type II errors can occur simultaneously if m > 1. The total number of rejections is denoted by R m in Table 2 . Notice that only m and R m are observable in practice, because all the quantities given in Table 2 , a variety of multiple type I and type II error rates have been introduced in the literature. The traditional type I error criterion for a multiple test, which we will also apply in Section 6, is (strong) control of the family-wise error rate (FWER). The number FWER θ (ϕ) = P θ (V m (ϕ) > 0) is called the FWER of the multiple test ϕ under θ, and ϕ is said to control the FWER strongly at level α ∈ (0, 1), if sup θ∈Θ FWER θ (ϕ) ≤ α, where Θ is the parameter space of the statistical model under consideration. The power of a given multiple test is typically defined in terms of S m , which is the (random) number of correctly rejected, false null hypotheses. In Section 6, we will compare concurring multiple tests by means of R m . If it is guaranteed that the concurring multiple tests all control the FWER at the same level α, a higher number of rejections is an indication for higher power.

In Section 6, we will apply the following multiple tests.

(a) Bonferroni correction, cf. [30, 31] : Each individual test ϕ i is carried out at the "local" significance level α/m. Due to the first-order Bonferroni inequality (also known as Boole's inequality or the union bound), this type of multiplicity correction yields a strongly FWER-controlling multiple test. (b) Holm's procedure, cf. [32] : Holm's procedure is a step-down variant of the Bonferroni method. The smallest p-value is compared with α/m. If it is smaller than this threshold, the corresponding null hypothesis is rejected and the second smallest p-value is compared with α/(m − 1), and so on.

The procedure stops as soon as one of the ordered p-values is not smaller than its threshold, and the corresponding null hypothesis as well as all null hypotheses corresponding to larger p-values are retained. Holm's procedure controls the FWER strongly. (c)Šidák correction, cf. [33] : Each individual test ϕ i is carried out at the local significance level 1 − (1 − α) 1/m . This type of multiplicity correction yields a strongly FWER-controlling multiple test if the (random) p-values p 1 , . . . , p m are jointly stochastically independent, or if they exhibit certain forms of positive dependence; see [34] for mathematical details.

(d)Šidák step-down:

Step-down variant of theŠidák procedure. It controls the FWER strongly if the (random) p-values p 1 , . . . , p m are jointly stochastically independent, or if they exhibit certain forms of positive dependence. (e) Hochberg's procedure, cf. [35] : Hochberg's procedure is a step-up test. The largest p-value is compared with the threshold α = α/1. If it is smaller than this threshold, then all m null hypotheses are rejected. If it is not smaller than α, then the corresponding null hypothesis is retained and the second largest p-value is compared with the threshold α/2, and so on. Hochberg's procedure controls the FWER strongly, if the Simes inequality (cf. [36] ) holds for p 1 , . . . , p m . Conditions for the latter have been discussed, for example, in [37] and [38] . Under joint independence of p 1 , . . . , p m , the Simes inequality is guaranteed to hold, but certain forms of (positive) dependence among p 1 , . . . , p m also lead to the validity of the Simes inequality.

For applications of multiple testing, especially in the life sciences, see [39] [40] [41] [42] [43] , and Parts II and III in [10] .

6 Analyzing Real Data

For illustrative purposes, Figure 1 contains plots of selected noisy data series taken from the real data examples which we will investigate deeper with respect to multiple test problems in Sections 6.3.1 -6.3.5 below.

The left graph in the upper row of Figure 1 displays data from Section 6.3.1, representing the red blood cell counts of 100 female athletes. The right graph in the upper row of Figure 1 displays data from Section 6.3.2, representing the hemoglobin levels of 1302 Coronavirus patients. The left graph in the middle row of Figure 1 displays data from Section 6.3.3, representing the time series of gross fixed capital in the United Kingdom (UK) over recent years. The right graph in the middle row of Figure 1 displays data from Section 6.3.4, representing the time series of agricultural products prices for import data in Germany during the year 2021. Finally, the lower row in Figure 1 displays data from Section 6.3.5, representing climate time series of two German cities, namely, a temperature time series of Stuttgart (left) and a divergence of wind time series of Halle (right), from the year 2000 until present.

Despite originating from different scientific areas and being of different structure, all six graphs displayed in Figure 1 exhibit a unifying feature, namely, the presence of pronounced noise, leading to highly volatile patterns in the six graphs. Therefore, appropriate preprocessing of these datasets appears necessary prior to a statistical analysis.

Due to the fact that the signal component f in the analysis of real data is unknown, the criteria defined in Section 4 cannot be used to evaluate methods in the real-data scenario. Therefore, we need another criterion to compare their [44] . The most commonly used criterion for evaluating a denoising method in a real dataset is the (empirical) signal-to-noise ratio (SNR), which has different definitions according to the applications. The SNR criterion calculates the relative magnitude of the received signal power relative to the noise. SNR is usually measured in the units of decibels (dB), and since the units of signal power and noise are the same, the formula works for any kinds of units. Here, we apply the following formula for the (empirical) signal-to-noise ratio:

where y are the observed data,f = (f 1 , . . . ,f N ) T denotes the estimated signal values after denoising, and Var denotes the empirical variance. In principle, larger SNR values indicate the superiority of a method. However, the empirical variance in (3) can be made artificially small by the use of a very complex model forf . This effect is commonly referred to as overfitting. Therefore, we penalize the (empirical) signal-to-noise ratio by a criterion that measures the smoothness or roughness, respectively, of a signal. To this end, we utilize the roughness measure mentioned around Equation (5.1.5) in [45] . Namely, the roughness of a function f , which has continuous first and second derivatives, is defined by

A lower amount of roughness is related to smoother function. As a discrete version of (4), we can define

where f = (f 1 , . . . , f N ) T and is the difference operator, given by f n = f n − f n−1 . In our study, to assess denoising quality and smoothness at the same time, we use the following goodness of denoising criterion:

Goodness of denoising(f ) = 10 × log 10 SNR(f ) + 1

Roughness N (f ) (6) Equation (6) refers to the decibel scale, which is a common scale in signal processing. When comparing two preprocessing methods on real data, we favor the method with a larger value of the aforementioned goodness of denoising criterion. Table 3 displays the mean values (over all variables in the respective dataset indicated in the first column) of the criterion from (6) for comparing the goodness of denoising of different methods. When applying SSA, we used the window length L = N/2. As it can be seen from Table 3 , SSA achieved the largest average goodness of denoising value for all six considered datasets. According to our definition, SSA thus exhibits the best (average) denoising performance among all considered methods for all considered datasets.

As a second criterion for assessing the quality of the decomposition of the observed data into a signal and a noise component, we considered the concept of separability. In fact, the SSA decomposition of the series Y N can only be successful if the signal and the noise components of the series are approximately separable from each other. Now, suppose that based on the diagonal averaging step of SSA, X and Y are the trajectory matrices of two reconstructed series X N and Y N , respectively. Define the w-scalar product of the series X N and Y N as (X N , Y N ) w = N k=1 w k x k y k = X, Y F , where ·, · F is the Frobenius inner product. Define the so-called w-correlation or weighted correlation as

where w k is the frequency with which the series element x k appears in the related trajectory matrix, and it is given by w k = min{k, L, N − k + 1}. If the absolute value of the w-correlation is close to zero, then the corresponding series are almost w-orthogonal. If it is large, then the two series are far from being w-orthogonal and are termed weakly separable. For more details, see Section 1.3 in [24] . Table 4 displays the values of the w-correlation between the reconstructed and residuals vectors for the variables considered in Figure 1 . All w-correlation values in Table 4 are rather small, indicating a good separation between the respective reconstructed and residuals vectors. From the results presented in Tables 3 and Table 4 it can be concluded that the SSA method is, among the considered preprocessing methods, the best choice for denoising the considered datasets.

In this section, we apply the following two-step data analysis pipeline to the datasets that we have briefly presented in Section 6.1.

Step 1: Application of the considered preprocessing methods in order to reduce the noise of the series

Step 2: Application of appropriate multiple testing methods to identify variables with statistically significant effects / group differences In all following examples, we consider as the type I error criterion FWER control at level α = 0.05.

Here, we consider data of 102 male and 100 female athletes on 11 variables collected at the Australian Institute of Sport (AIS). This dataset is available in the R package sn. These data have been made publicly available in connection with the book by Cook and Weisberg (1994) ; see [46] . The focus of this study was to detect mean differences in the measured variables in female compared to male. Denoting the sex-specific population means for variable i by µ (i) male and µ (i) female , respectively, the family of null hypotheses to be tested simultaneously is therefore given by

As test statistics, the t-statistics

for i ∈ {1, . . . , 11} have been used, whereȲ i,1 ,Ȳ i,2 and S 2 i,1 , S 2 i,2 denote the empirical means and the empirical variances of males and females for variable i, and n 1 and n 2 denote the group-specific sample sizes. Letting n = n 1 + n 2 , Student's t-distribution with n − 2 degrees of freedom (with corresponding cumulative distribution function denoted by F tn−2 ) has been used as the null distribution of each of the test statistics, leading to the two-sided marginal pvalue p i = 2 1 − F tn−2 (|T i |) for variable i ∈ {1, . . . , 11}. For performing the data analysis, the R packages Rssa, multtest and mutoss have been used. Given that no assumption about the dependency structure among the marginal p-values is imposed, we applied the Bonferroni correction as well as Holm's procedure, which are controlling the FWER under any dependency structure; see Section 5.2. For both of the latter multiple tests and for all considered preprocessing methods, we obtained R 11 = 10 rejections.

In this example, we consider several parameters with respect to their associations with the Coronavirus infection status, where the underlying data include missing values. The data have been collected from Imam Reza Hospital, Mashhad, Iran (https://emamreza.mums.ac.ir/). Here, in a first step of data analysis we imputed missing values by utilizing SSA, and the two-step data analysis pipeline given at the beginning of this section has then been applied to the imputed data. For imputing missing values, they are in a first step replaced by zeros, and then updated in repeating cycles using basic SSA filtering until convergence is reached; see [47] . In total, m = 13 parameters have been assessed for two independent samples, consisting of 1302 patients and 9761 healthy controls, respectively, which have been ordered based on the age of the subjects.

In analogy to Section 6.3.1, we have tested the m = 13 null hypotheses of zero population mean differences (case group minus control group) corresponding to the 13 measured parameters by calculating 13 marginal t-statistics and their corresponding (two-sided) p-values. Furthermore, since the measured parameters have to be assumed to be interrelated without further information about the type of resulting dependence structure among the p-values, we again applied the Bonferroni correction as well as Holm's procedure. In this example, both considered multiple tests, in combination with any of the considered preprocessing methods, rejected all 13 null hypotheses.

The data that are used in this study are related to the UK and are taken from the Office for National Statistics (ONS). 

In (10) Denoting the available sample size for variable i by N i , we assume as the null distribution of F i Fisher's F -distribution with 3 and N i − 3 degrees of freedom, 1 ≤ i ≤ 173, and this null distribution has been used to compute the corresponding p-value p i .

Assuming positive dependence among the p i 's, the Hochberg procedure as well as the single-step and the step-down version of theŠidák multiple test are appropriate for testing H 173 ; see Section 5.2. Table 5 tabulates the numbers of rejected null hypotheses for these three different multiple tests in connection with different preprocessing methods. In Table 5 , the numbers of rejected null hypotheses for MSSA and ETS are close to each other. However, they are approximately 15% and 20% larger than those obtained by applying ARFIMA and NN, respectively. 

The data that are used in this example are related to Germany and are taken from GENESIS-Online Statistisches Bundesamt. The considered dataset includes m = 76 major components of import and export price indices. In all corresponding 76 time series, the monthly sample period starts in January 2021 and ends in December 2021. The goal of the statistical analysis is to detect those components for which there is a difference between the population means in the import price index and the export price index. To this end, we analyzed each component i by computing a t-statistic T i and the corresponding two-sided p-value p i , where i ∈ {1, . . . , 76}, in analogy to Sections 6.3.1 and 6.3.2. As in Section 6.3.3, we assumed positive dependence among the p i 's and applied the Hochberg procedure as well as the single-step and the step-down version of theŠidák multiple test. In analogy to Table 5, Table 6 tabulates the obtained numbers of rejected null hypotheses for these three different multiple tests in connection with different preprocessing methods. In Table 6 , the rejection numbers for MSSA are always larger than the rejection numbers for the other considered preprocessing methods.

Here we consider several atmospheric variables from the ERA-5 reanalysis dataset (see [48] ) for the two German cities Halle and Stuttgart, from 2000 to present. These reanalyses combine observations and global climate model simulations with the goal of providing an accurate estimation of the state of the atmosphere at 0.25 • of horizontal resolution and on a global scale. The use of global climate simulations makes it possible for ERA-5 to cover gaps in observations over the globe, but it also introduces some uncertainty associated with the default values of the climate model. There are also uncertainties arising from the observations (they are limited in space and time and there are human and instrumental errors), and their number and precision have evolved over the years. The goal of the statistical analysis of this dataset is to identify those climate variables from m = 18 variables for which the population mean referring to Halle differs from the population mean referring to Stuttgart. To this end, in analogy to Sections 6.3.1, 6.3.2, and 6.3.4, we analyzed each climate variable i by computing a t-statistic T i and the corresponding two-sided p-value p i , where i ∈ {1, . . . , 18}. Since we do not have dependence information in this example, we applied the Bonferroni test as well as Holm's procedure, because they are applicable under any kind of dependence structure among the pvalues. In the case of applying the MSSA preprocessing method, we obtained R 18 (ϕ Bonferroni ) = R 18 (ϕ Holm ) = 15 rejections. In the case of applying the ARFIMA preprocessing method, we obtained R 18 (ϕ Bonferroni ) = R 18 (ϕ Holm ) = 11 rejections, and for the ETS method as well as for the NN method, we obtained R 18 (ϕ Bonferroni ) = R 18 (ϕ Holm ) = 10 rejections. Thus, by applying MSSA as a preprocessing method, we obtained four or five additional rejections in comparison with the other considered preprocessing methods.

The main conclusion from all considered multiple test problems in this section is that appropriate data preprocessing can help to sharpen the statistical inference in the sense that a better denoising can lead to an increased statistical power of a multiple test which is applied on the denoised data series. (As mentioned in Section 5, we take a larger number of rejected null hypotheses as an indication for higher statistical power when comparing the performance of two multiple tests, which control the FWER at the same level, on real data.) Since demonstrating this is the main purpose of this section, we abstained from a detailed discussion of the domain science implications of our data analysis results.

We have compared four different data preprocessing methods with respect to their performance in denoising, both on simulated and on real data. In the case of simulated data, ground truth is known, and we have quantified the denoising accuracy by calculating RRMSE and RMAE referring to the signal component of the data. In the case of real data, ground truth is unknown, and we have introduced a goodness of denoising criterion which combines the empirical SNR with a roughness penalty. In both cases, SSA yielded overall the best denoising results.

Furthermore, we have assessed the impact of proper denoising on the performance of a variety of standard multiple tests for different study designs, including models with two independent groups, and models with more than two groups. Here, MSSA yielded overall the best results in terms of a large number of rejections under the constraint of FWER control.

There are several directions for follow-up research: First, it may be of interest to carry out computer simulations in order to assess the impact of proper denoising on the type I error behavior of multiple tests when ground truth is known. In this respect, we have implicitly assumed in Section 6.3 that the p-values originating from the different preprocessing methods are valid in the sense of Equation (1) in [43] . Second, given the empirical evidence that (multivariate) SSA has favorable denoising properties, it may be worthwhile to develop a mathematical optimality theory which underpins this empirical evidence by theoretical arguments. Third, from the perspective of applications, it appears appealing to interpret the estimated signal components of the considered data series in their respective domain science context.

Nothing to declare.

Data Preparation for Data Mining

From spots to candidates: image analysis, data processing, candidate selection

The impact of signal pre-processing on the final interpretation of analytical outcomes -a tutorial

Automatic time series forecasting: The forecast package for R

Forecasting: Principles and Practice

Neural Networks for Pattern Recognition

Theory of Nonparametric Tests

Predicting inflation dynamics with singular spectrum analysis

Handbook of Multiple Comparisons. Chapman & Hall/CRC Handbooks of Modern Statistical Methods

Simultaneous Statistical Inference with Applications in the Life Sciences

Extracting qualitative dynamics from experimental data

Bicoid signal extraction: another powerful approach

Forecasting tourism demand with denoised neural networks

Modeling European industrial production with multivariate singular spectrum analysis: a cross-industry analysis

Predicting global temperature anomaly: a definitive investigation using an ensemble of twelve competing forecasting models

Monthly forecasting of gdp with mixed-frequency multivariate singular spectrum analysis

The effect of data transformation on singular spectrum analysis for forecasting

Cross country relations in european tourist arrivals

Real-time nowcasting the us output gap: Singular spectrum analysis at work

Forecasting energy data with a time lag into the future and google trends

Enhancing multivariate singular spectrum analysis for phase synchronization: the role of observability

Analysis of Time Series Structure: SSA and Related Techniques

Singular Spectrum Analysis for Time Series

Singular Spectrum Analysis with R

Forecasting wholesale price of cluster bean using the Autoregressive Fractionally Integrated Moving-Average Model: the case of Sri Ganganagar of Rajasthan in India

Autoregressive fractionally integrated moving average-generalized autoregressive conditional heteroskedasticity model with level shift intervention

A state space framework for automatic forecasting using exponential smoothing methods

Forecasting with Exponential Smoothing. The State Space Approach

Nonparametric neural networks. In: Proceedings of the International Conference on Learning Representations

Il calcolo delle assicurazioni su gruppi di teste

Teoria statistica delle classi e calcolo delle probabilita

A simple sequentially rejective multiple test procedure

Rectangular confidence regions for the means of multivariate normal distributions

Multiple point hypothesis test problems and effective numbers of tests for control of the family-wise error rate

A sharper Bonferroni procedure for multiple tests of significance

An improved Bonferroni procedure for multiple tests of significance

On the Simes inequality in elliptical models

On the Simes test under dependence

Simultaneous statistical inference for epigenetic data

Multivariate multiple test procedures based on nonparametric copula estimation

Simultaneous Bayesian analysis of contingency tables in genetic association studies

Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics

Randomized -values for multiple testing and their application in replicability analysis

Denoising heavy-tailed data using a novel robust non-parametric method based on quantile regression. Fluctuation and Noise Letters

Multivariate Statistical Modelling Based on Generalized Linear Models. Second Edition

An Introduction to Regression Graphics

Singular Spectrum Analysis Using R

The ERA5 global reanalysis