key: cord-0168321-j7va63eg authors: Livieri, Giulia; Mancino, Maria Elvira; Marmi, Stefano; Toscano, Giacomo title: Volatility of volatility estimation: central limit theorems for the Fourier transform estimator and empirical study of the daily time series stylized facts date: 2021-12-29 journal: nan DOI: nan sha: b9c2d006c5804d395e077b50ea67a50d9a94a63c doc_id: 168321 cord_uid: j7va63eg We study the asymptotic normality of two estimators of the integrated volatility of volatility based on the Fourier methodology, which does not require the pre-estimation of the spot volatility. We show that the bias-corrected estimator reaches the optimal rate $n^{1/4}$, while the estimator without bias-correction has a slower convergence rate and a smaller asymptotic variance. Additionally, we provide simulation results that support the theoretical asymptotic distribution of the rate-efficient estimator and show the accuracy of the Fourier estimator in comparison with a rate-optimal estimator based on the pre-estimation of the spot volatility. Finally, we reconstruct the daily volatility of volatility of the S&P500 and EUROSTOXX50 indices over long samples via the rate-optimal Fourier estimator and provide novel insight into the existence of stylized facts about its dynamics. In the last decades, different stochastic volatility models have been proposed to describe the evolution of asset prices, motivated by empirical studies on the patterns of volatilities in financial time series. Further, the availability of high-frequency data has given impulse to devise statistical techniques aimed at the efficient estimation of model parameters in the stochastic volatility framework, e.g., the leverage and the volatility of volatility processes. The estimation of these model parameters is rather complicated, the main difficulties being due to the fact that some factors are unobservable. In particular, the estimation of the volatility of volatility is a challenging task, because a pre-estimation of the spot volatility is typically required as a first step, due to the latency of the volatility process. Unlike the case of the integrated volatility, the non-parametric estimation of the integrated volatility of volatility is a relatively recent topic. [Barndorff-Nielsen and Veraart, 2009] propose a new class of stochastic volatility of volatility models, with an extra source of randomness, and show that the volatility of volatility can be estimated non-parametrically by means of the quadratic variation of the preliminarily estimated squared volatility process, which they name pre-estimated spot variance based realized variance. [Vetter, 2015] proposes an estimator of the integrated volatility of volatility which is also based on increments of the pre-estimated spot volatility process and attains the optimal convergence rate in the absence of noise. The common feature of these estimators is that they first reconstruct the unobservable volatility path via some consistent estimator thereof and then compute the volatility of volatility using the estimated paths as a proxy of the corresponding unknown paths. The issue of estimating the volatility of volatility in the presence of jumps is studied in [Cuchiero and Teichmann, 2015] : first, the authors combine jump robust estimators of the integrated variance and the Fourier-Fejer inversion formula to get an estimator of the instantaneous volatility path; secondly, they use again jump robust estimators of the integrated volatility, in which they plug the estimated path of the volatility process, to obtain an estimator of the volatility of volatility. In the same spirit of [Barndorff- Nielsen and Veraart, 2009, Vetter, 2015] , [Li et al., 2021] also propose an estimator of the integrated volatility of volatility by means of a pre-estimation of the spot volatility, but, in order to extend the study to the case when the observed price process contains jumps and microstructure noise, the authors adopt a threshold pre-averaging estimator of the volatility, following [Jing et al., 2014] . In this paper, we focus on the estimation of the integrated volatility of volatility via the Fourier estimation method by [Malliavin and Mancino, 2002] , which does not require the pre-estimation of the spot volatility. An early application of the Fourier methodology to identify the parameters (volatility of volatility and leverage) of stochastic volatility models has been proposed by [Barucci and Mancino, 2010] , where the authors prove a consistency result for the estimator of both the integrated leverage and volatility of volatility in the absence of noise. In the presence of microstructure noise, [Sanfelici et al., 2015] study the finite-sample properties of the Fourier estimator of the volatility of volatility introduced in [Barucci and Mancino, 2010] and show its asymptotic unbiasedness. However, the convergence rate of the estimator is not established, not even in the absence of microstructure noise contamination. In the present paper we fill this gap. Specifically, after proving that the Fourier estimator of the volatility of volatility by [Sanfelici et al., 2015] has a sub-optimal rate of convergence, we define its biascorrected version and prove that it reaches the optimal convergence rate n 1/4 . We also show that the non-corrected estimator with slower rate of convergence displays a smaller asymptotic error variance. Further, we provide feasible versions of the two CLT's that exploit the product formula for the Fourier coefficients of the volatility of volatility and the fourth power of volatility. The same property of the Fourier coefficients is used in [Livieri et al., 2019] for the estimation of the quarticity. The asymptotic results are supported by a simulation exercise, where we also compare the finitesample performance of the rate-efficient Fourier estimator with that of the rate-efficient realized estimator based on the pre-estimation of the spot volatility by [Ait-Sahalia and Jacod, 2014] . The comparative study suggests that the Fourier estimator works quite well on the daily horizon, while the performance of the realized estimator appears to be not satisfactory. This feature may be related to the fact that, differently from the other volatility of volatility estimators, which rely on the pre-estimation of the instantaneous volatility path via a numerical differentiation, the Fourier approach relies only on the reconstruction of integrated quantities, i.e., the Fourier coefficients of the volatility. As it was early observed in [Malliavin and Mancino, 2002] , this is a peculiarity of the Fourier estimator that renders the proposed method easily implementable and computationally stable. Finally, we present an empirical exercise where the Fourier estimator is applied to obtain the daily time series of the volatility of volatility of the S&P500 and EUROSTOXX50 indices over, resp., the periods May 1, 2007 -August 6, 2021 and June 29, 2005 -May 28, 2021. As a result, we obtain some novel insight into the empirical regularities that characterize the daily dynamics of the volatility of volatility, whichto the best of our knowledge -up to now had been scarcely explored in the literature. Specifically, we find that the daily volatility of volatility of both the indices spikes in correspondence of periods of financial turmoil (e.g., during the financial crisis of 2008 and the outbreak of the COVID pandemic in 2020). Additionally, we also find that it is usually positively (resp., negatively) correlated with the volatility (resp., the asset return), but appears to be less persistent than the volatility. Finally, we observe that its empirical distribution is satisfactorily approximated by a log-normal distribution in years characterized by higher financial stability, as it is the case for the volatility. This novel insight appears to be valuable in view of the relevance of the volatility of volatility for scholars and practitioners. Indeed, on the one hand, market operators regularly "trade" the volatility of many financial asset classes via quoted and O.T.C. volatility derivatives (e.g., variance swaps, VIX futures and VIX options), hence the importance of the availability of accurate estimates of the volatility of the "traded" volatility. On the other hand, the need for efficient estimates of the volatility of volatility arises also in a number of technical tasks, e.g., the calibration of stochastic volatility of volatility models ([Barndorff-Nielsen and Veraart, 2009] , [Sanfelici et al., 2015] ), the estimation of the leverage coefficient ( [Kalnina and Xiu, 2017] , [Ait-Sahalia et al., 2017] ), the inference of future returns (Bollerslev et al. (2009) ) and spot volatilities ( [Mykland and Zhang, 2009] ). Furthermore, [Bandi et al., 2020] have recently provided empirical support to the dependence between the volatility of volatility of equity assets and structural sources of risk related to firms' characteristics. The paper is organized as follows. Section 2 contains the assumptions and definitions. Section 3 states the central limit theorems, which are supported by the simulation study in Section 4. Finally, Section 5 contains the empirical results and Section 6 concludes. The proofs are given in Appendix A, while Appendix B contains some auxiliary lemmas on the Fejer and Dirichlet kernels. This section presents the general non-parametric stochastic volatility model which will be considered throughout the paper and defines two estimators of the integrated volatility of volatility based on the Fourier estimation method introduced in Mancino, 2002, Malliavin and Mancino, 2009 ]. The class considered includes most of the continuous stochastic volatility models commonly used in highfrequency finance and is assumed (to cite one among many others) in [Ait-Sahalia and Jacod, 2014], chp.8.3. We make the following assumptions. (A.I) The log-price process p and the variance process v are continuous Itô semimartingales on [0, T ] satisfying the stochastic differential equations where v := σ 2 , while W and Z are Brownian motions on a filtered probability space (Ω, (F t ) t∈[0,T ] , P ) satisfying the usual conditions, possibly correlated (in this regard, note that it is not restrictive to assume a constant correlation ρ). (A.II) The processes σ, b, γ and β are continuous adapted stochastic processes defined on the same probability space (Ω, (F t ) t∈[0,T ] , P ), such that for any p ≥ 1 The processes are specified in such a way that the spot volatility and volatility of volatility, resp. σ and γ, are a.s. positive. (A.III) The process γ is a continuous Itô semimartingale, whose drift and diffusion processes are continuous adapted stochastic processes defined on the same probability space (Ω, (F t ) t∈[0,T ] , P ). The assumptions (A.I)-(A.II)-(A.III) are standard in the non-parametric setting and are considered, e.g., in [Barndorff-Nielsen and Veraart, 2009 , Ait-Sahalia and Jacod, 2014 , Cuchiero and Teichmann, 2015 , Vetter, 2015 , Li et al., 2021 . By changing of the origin of time and scaling the unit of time, one can always modify the time window [0, T ] to [0, 2π] . Suppose that the asset log-price p is observed at discrete, irregularly-spaced points in time on the grid {0 = t 0,n ≤ . . . t i,n . . . ≤ t n,n = 2π}. For simplicity, we omit the second index n. Denote ρ(n) := max 0≤h≤n−1 |t h+1 − t h | and suppose that ρ(n) → 0 as n → ∞. For any integer k, |k| ≤ 2N , the discretized version of the Fourier coefficient c k (dp) is denoted by c k (dp n ) := 1 2π where the symbol i is the imaginary unit √ −1. Further, for any |k| ≤ N , define the convolution formula c k (v n,N ) := 2π 2N + 1 |s|≤N c s (dp n )c k−s (dp n ). (2) In [Malliavin and Mancino, 2009] it is proved that (2) is a consistent estimator of the k-th Fourier coefficient of the volatility process 1 and in [Barucci and Mancino, 2010, Sanfelici et al., 2015] it is shown that it is possible to derive an estimator of the integrated volatility of volatility by exploiting only the knowledge of the Fourier coefficients in (2), without the need of the preliminary estimation of the instantaneous volatility. This fact characterizes the Fourier method for estimating the volatility of volatility. In fact, as far as we know, all other existing methods rely on the pre-estimation of the spot volatility, see [Ait-Sahalia and Jacod, 2014 , Cuchiero and Teichmann, 2015 , Vetter, 2015 , Li et al., 2021 . In general, these methods entail the pre-estimation of the spot volatility, in the presence or absence of noise contamination, as a first step; then, as a second step, a quadratic variation approach (e.g., the realised volatility formula) is applied to the pre-estimated spot volatility trajectory. The estimator of the integrated volatility of volatility, defined in [Sanfelici et al., 2015] , is given by 2π times as c 0 (γ 2 n,N,M ) is the estimator of c 0 (γ 2 ) = 1 2π 2π 0 γ 2 (t)dt. [Sanfelici et al., 2015] show that the estimator (3) is consistent under the assumptions (A.I)-(A.II)-(A.III) and the conditions N/n → 0 and M 4 /N → 0 and asymptotically unbiased in the presence of microstructure noise 2 . However, the rate of convergence 1 Hereinafter, we will follow the relevant econometric literature by using the term volatility as a synonym of variance, thus referring to σ 2 (t) as the volatility process. Similarly for the volatility of volatility. 2 Note that these conditions on n, N and M are only sufficient for the consistency; the fact that they not sharp is due to the fact that the focus of the paper was not on the convergence rate, but on the finite-sample properties of the estimator in the presence of microstructure noise. and the asymptotic normality are not established. We show in Theorem 3.1 that the convergence rate of the estimator (3) is not optimal. However, the estimator has a very good finite-sample performance, as shown in Section 4. In order to obtain an estimator with the optimal rate of convergence in the absence of microstructure noise, a bias correction is needed and thus we consider the estimator where the constant K is determined in (51) and σ 4 n,N,M is the Fourier quarticity estimator The asymptotic normality of the estimator (5) is studied in [Livieri et al., 2019] , while its properties in the presence of microstructure noise are studied in [Mancino and Sanfelici, 2012] . Note that the estimator (4) differs from (3) for the presence of the bias correction K σ 4 n,N,M . This bias correction, while ensuring a faster rate of convergence, destroys the positivity of the estimator, see also [Barndorff-Nielsen et al., 2011] . The estimator (3) is instead positive. In this Section we study the asymptotic normality of the Fourier estimators of the integrated volatility of volatility defined by (3) and (4) and prove that the estimator (4) reaches the optimal rate of convergence n 1/4 , at the cost of a de-biasing term, while the estimator (3) has a smaller asymptotic variance at the cost of a slower convergence rate. Note that if c N = π or, equivalently, N = n/2 (i.e., the cutting frequency N used for the estimation of the volatility coefficient given the log-prices is equal to the Nyquist frequency), then η(c N /π) = 0 and the asymptotic variance in Theorem 3.1 becomes 1 2π Remark 3.2 The realised volatility of volatility estimator (Th. 8.11 [Ait-Sahalia and Jacod, 2014] ) is obtained as the quadratic variation of the estimated spot volatility, with a de-biasing term depending on the quarticity. The underlying model is a continuous semimartingale for the price, the volatility and the volatility of volatility. The convergence rate of the estimator is n 1/4 and the asymptotic variance is Letting β = 1/c M , the correspondence between the asymptotic variances (6) and (7) is easily seen, with the second and third terms smaller in the case of the Fourier estimator. Note that the estimator in [Ait-Sahalia and Jacod, 2014] corresponds to (4) multiplied by 2π. A similar approach as [Ait-Sahalia and Jacod, 2014] is considered in [Li et al., 2021] , but it is extended to obtain a consistent estimator in the presence of noisy data. To this aim, the authors first build an estimator of the spot volatility by means of a pre-averaging method to get rid of the noise contamination, then they compute the realized variance from the spot volatility estimates to obtain an estimator of the integrated volatility of volatility. Finally, they also need to correct for the bias of the obtained estimator. The rate of convergence is n 1/8 in the presence of noise and n 1/4 without noise. In order to obtain a feasible CLT from Theorem 3.1, a consistent estimator of the conditional variance is needed. We exploit again the Fourier methodology to build a consistent estimator of The result is obtained in Proposition 3.4, where the key ingredients are the following Remark 3.3 and the product formula for the Fourier coefficients, as studied in [Livieri et al., 2019] . Remark 3.3 The estimation of the integrated volatility of volatility relies on the convolution product which allows computing the 0-th Fourier coefficient of the volatility of volatility process. The result is trivially extended to consider any continuous bounded function h as which leads to an estimator of 1 2π 2π 0 h(t)γ 2 (t)dt. In particular, for h(t) := e −ikt , the convolution product (9) provides a formula for estimating the k-th Fourier coefficient of the volatility of volatility process γ 2 (t) (see also [Clement and Gloter, 2011] for the analogous result in the case of the multivariate Fourier volatility estimator). Based on Remark 3.3, for any integer k, |k| ≤ 2M , we define where K is computed in (51). They are, resp., consistent estimators of c k (σ 4 ) and c k (γ 2 ), for any integer k. The following result holds. It is possible to obtain an estimator of the volatility of volatility without a bias-correction term and a smaller asymptotic variance, but the rate of convergence is slower, precisely n ι/2 , with ι/2 ∈ (0, 1/5). The estimator is simply given by (3) multiplied by 2π and the following result holds. In order to build a feasible TLC it is enough to apply the same methodology as for Theorem 3.4. In particular, under the conditions N ρ(n) ∼ c N and M ρ(n) ι ∼ c M , where ι ∈ (0, 2/5), a consistent estimator of the asymptotic variance is given by where Therefore, the following holds. Remark 3.8 The result in Theorem 3.6 is in line with [Cuchiero and Teichmann, 2015] , Th. 3.13, where an estimator without bias correction is considered. However, the proposed estimator relies on a smooth function of the plug-in spot volatility, where the spot volatility is estimated with the Fourier method. Therefore, it differs from our estimator, which is based on the convolution formula of the Fourier coefficients of the volatility process. In this section we present a simulation study of the finite-sample performance of the rate-efficient estimator (4). The objective of the study is to provide support to the asymptotic result in Theorem 3.1, offer insight into the optimal selection of the frequency M , assess the robustness of the performance of the estimator to irregular sampling schemes and illustrate a comparison of its accuracy with that of the rate-efficient realized estimator by [Ait-Sahalia and Jacod, 2014]. We simulated discrete observations from two parametric models which satisfy Assumptions (A.I)-(A.II)-(A.III). The first model that we simulated is the Heston model (see [Heston, 1993] ): where v(t) := σ 2 (t), µ ∈ R, θ, α, γ > 0, and ρ denotes the correlation between the Brownian motions W and Z. Under the Heston model, the volatility of volatility is given by γ 2 (t) = γ 2 v(t). The second model that we simulated is the stochastic volatility of volatility model that appears in [Barndorff-Nielsen and Veraart, 2013] and [Sanfelici et al., 2015] . The model is as follows: where Y is a Brownian motion independent of W and Z, and µ ∈ R, θ, α, χ, η, ξ > 0. The parameter vectors used for the simulations of the models (15) and (16) are, resp.: Note that the selection of a negative ρ reproduces the presence of leverage effects. For each model we simulated 10 4 trajectories of length T = 1/252, i.e., one trading day. For each trajectory, observations were simulated on the equally-spaced grid with mesh equal to 1 second. For the simulations we have assumed that one trading day is 6.5-hour long. We assessed the finite-sample performance of the estimator (4) for increasing values of the sample size n, in order to provide numerical support to Theorem 3.1. Specifically, given ρ(n) = T /n, we chose a fixed estimation horizon T , corresponding to one day (as customary in econometric analyses), and let n vary. The resulting values of ρ(n) considered range between 1 second and 5 minutes. For what concerns the frequency N , which is needed for the convolution formula (2), we set N = c N ρ(n) −1 and selected c N = T /2. This selection yields the value of N equal to the Nyquist frequency [n/2] and allows obtaining the smallest variance of the asymptotic error, see (6) in section 3. As for the frequency M , we set M = c M ρ(n) −1/2 and optimized the value of the constant c M based on the (unfeasible) numerical minimization of the mean squared error (MSE). In this regard, we found that the MSE-optimal value of c M is equal to, resp., 0.05 and 0.07 for the models (15) and (16) (see subsection 4.3, where a feasible procedure to select M is also discussed). Table 1 illustrates the finite-sample performance of the estimator (4) under the two different datagenerating processes considered. Specifically, the Table illustrates the MSE and the bias for the different values of the sampling frequency ρ(n). As expected, the bias and MSE improve as n is increased for both the data-generating processes considered, thereby providing numerical support to Theorem 3.1. In particular, note that the performance of the estimator is still satisfactory for ρ(n) equal to 5 minutes, the sampling frequency typically used in the absence of noise with empirical data. Additionally, it is worth mentioning that the estimator never produced negative volatility of volatility estimates in this simulation study. As additional support to the results in Thereom 3.1, the q-q plots in Figures 1 and 2 1 minute 4.425 · 10 −9 −4.142 · 10 −6 1.939 · 10 −8 1.304 · 10 −6 30 seconds 2.913 · 10 −9 −3.801 · 10 −6 1.327 · 10 −8 1.269 · 10 −6 5 seconds 9.985 · 10 −10 −2.932 · 10 −7 8.113 · 10 −9 4.161 · 10 −7 1 second 4.229 · 10 −10 −1.833 · 10 −7 6.199 · 10 −9 3.644 · 10 −7 of the daily integrated volatility of volatility are equal, on average, to 1.985 · 10 −4 and 3.957 · 10 −4 for, resp., the models (15) and (16). case of the model (15) and the model (16), as ρ(n) becomes smaller, the approximation to the the standard normal distribution improves. In particular, while the approximation of the body of the distribution is satisfactory also for the largest ρ(n) considered, i.e., 5 minutes, the approximation in the tails becomes accurate for ρ(n) smaller or equal than 5 seconds. Figure 3 , the optimal MSE is achieved when c M is equal to, resp., 0.05 and 0.07 for the models (15) and (16), independently of ρ(n). Further, it appears that the MSE is relatively flat Emp. quantiles of est. err. Quantiles of standard normal distribution So far, in the simulation study we assumed that prices were observable on the equally-spaced grid with mesh size ρ(n) equal to 1 second. However, the setup of Section 3 allows for an irregular sampling scheme. To assess the robustness of the performance of the estimator (4) to irregular sampling, we considered the case when observation times follow a Poisson process, that is, durations between observations are drawn from an exponential distribution with mean λ (see, e.g., [Mancino et al., 2017] , Chapter 3.3). Specifically, we considered three different values of λ, corresponding to an average duration δ of 1.25, 1.5 and 2 seconds, and compared the resulting MSE and bias values with the case of regular sampling on the 1-second grid. For the estimation with the Poisson scheme, we set N = [n/2] and optimized M based on the minimization of the MSE. In this regard, we found that it is MSE-optimal to select a smaller M , compared to the regular-sampling case. Specifically, letting M * denote the optimal selection with regular 1-second sampling, numerical results suggest that it is optimal to select M = [M * /2]. Table 2 suggest that the Fourier estimator (4) may still offer a satisfactory performance with irregular sampling schemes; in particular, it appears that the bias is relatively less affected than the MSE, compared to the regular-sampling case. Heston model stochastic vol-of-vol model δ MSE Bias MSE Bias 2 6.888 · 10 −10 −1.910 · 10 −7 9.672 · 10 −9 3.840 · 10 −7 1.5 5.250 · 10 −10 −1.888 · 10 −7 7.564 · 10 −9 3.776 · 10 −7 1.25 4.662 · 10 −10 −1.865 · 10 −7 6.696 · 10 −9 3.730 · 10 −7 regular 1-sec. sampling 4.229 · 10 −10 −1.833 · 10 −7 6.199 · 10 −9 3.644 · 10 −7 This subsection contains a comparative study of the finite-sample performance of the rate-efficient Fourier estimator (4) and the rate-efficient realized estimator by [Ait-Sahalia and Jacod, 2014] (see Remark 3.2). We recall the definition of the latter. Let κ(n) denote a sequence of integers such that κ(n) ∼ βρ(n) −1/2 , β > 0, ρ(n) := T /n. The estimator readŝ where is the local estimator employed to pre-estimate the spot variance at time t i = iT /n, i = 0, ..., n. The estimator (17) is also studied in [Vetter, 2015] , where the author replaces σ 2 The rate-efficient realized estimator considered in [Vetter, 2015] thus readŝ For the comparison, we replicated the simulation study carried out in subsection 4.2, this time using the realized estimators (17) and (19) to obtain estimates of the daily integrated volatility of volatility. The implementation of realized estimators requires the selection of the tuning parameter β. After setting κ(n) = [βρ(n) −1/2 ], we selected β = 0.04 (resp., β = 0.06) in the case of the model (15) (resp., 16), based on the unfeasible optimization of the MSE with 1-second samples. Tables 3 and 4 summarize the results. By comparing the latter with Table 1 in subsection 4.2, it is immediate to see that the performance of the realized estimators (17) and (19) is not satisfactory, both in terms of bias and MSE, compared to the case of the Fourier estimator (4). See also [Sanfelici et al., 2015] and [Toscano and Recchioni, 2021] for similar considerations on the finite-sample performance of realized volatility of volatility estimators. Moreover, note that the comparison for ρ(n) equal to 5 minutes is omitted, since the resulting bias and MSE of the realized estimators are larger than 1 in absolute value. Finally, simulations suggest that the use of the quarticity estimator in the de-biasing term in (19) does not improve the finite-sample performance. Heston model stochastic vol-of-vol model ρ(n) MSE Bias MSE Bias 1 minute 1.800 · 10 −3 1.473 · 10 −2 1.655 · 10 −3 1.299 · 10 −2 30 seconds 5.364 · 10 −4 1.388 · 10 −2 4.114 · 10 −4 1.119 · 10 −2 5 seconds 3.733 · 10 −4 1.047 · 10 −2 3.390 · 10 −4 1.001 · 10 −2 1 second 3.322 · 10 −4 9.838 · 10 −3 2.999 · 10 −4 9.555 · 10 −3 resp., the models (15) and (16). Heston model stochastic vol-of-vol model ρ(n) MSE Bias MSE Bias 1 minute 1.501 · 10 −3 1.575 · 10 −2 1.377 · 10 −3 1.303 · 10 −2 30 seconds 5.461 · 10 −4 1.456 · 10 −2 4.336 · 10 −4 1.122 · 10 −2 5 seconds 3.783 · 10 −4 1.091 · 10 −2 3.302 · 10 −4 9.998 · 10 −3 1 second 3.324 · 10 −4 9.840 · 10 −3 3.002 · 10 −4 9.555 · 10 −3 resp., the models (15) and (16). Remark 4.2 Unreported simulations show that the realized estimators (17) and (19) improve their finitesample performance for larger values of the estimation horizon T . Specifically, realized estimators appear to achieve satisfactory accuracy, compared to the Fourier estimator (4), when T is equal to one year. This is in line with the selection of T equal to one year in the numerical and empirical high-frequency exercises by [Li et al., 2021] , where the performance of the noise-and jump-robust version of the realized estimator volatility of volatility estimator is investigated. To the best of our knowledge, the empirical properties of the volatility of volatility of financial assets have been scarcely explored in the literature. The aim of the empirical study presented in this Section is thus to provide insight into the existence of stylized facts pertaining to the daily dynamics of the volatility of volatility. In fact, the numerical evidence presented in Section 4 suggests that the Fourier methodology allows reconstructing the integrated volatility of volatility with satisfactory accuracy on daily intervals by means of the rate-efficient estimator (4). Accordingly, in this section we use the estimator (4) For the empirical analysis we used the series of 5-minute trade prices, recorded during trading hours. Specifically, for the S&P500 index, we used the prices recorded between 9.30 a.m. and 4 p.m., while for the EUROSTOXX50 index we employed the prices recorded between 9 a.m. and 5.30 p.m. Days with early closure were discarded. The estimation of the daily integrated volatility of volatility was performed via the rate-efficient Fourier estimator (4), without considering overnight returns. Before performing the estimation, we run the test by [Ait-Sahalia and Xiu 2014] on 5-minute series and found that the assumption of absence of noise could not be rejected at the 5% significance level for both indices. Moreover, following [Wang and Mykland, 2014] , days with jumps were removed, based on the results of the test by [Lee and Mykland, 2008] , which was applied at the 1% significance level. Overall, the number of days for which we estimated the volatility of volatility is 3343 and 3522 for, resp., the S&P500 and EUROSTOXX50. [Malliavin and Mancino, 2002] , applied to 5-minute returns 4 . We note that all volatility of volatility estimates obtained are strictly positive. SPX vol. 8.598 · 10 −5 3.098 · 10 −5 2.152 · 10 −4 1.701 · 10 −6 3.209 · 10 −3 7.951 83.902 SPX vol. of vol. 2.351 · 10 −5 3.867 · 10 −7 2.608 · 10 −4 9.545 · 10 −10 1.009 · 10 −2 24.442 780.377 ESTX vol. 1.093 · 10 −4 5.848 · 10 −5 1.944 · 10 −4 2.845 · 10 −6 3.916 · 10 −3 8.198 104.168 ESTX vol. of vol. 3.233 · 10 −5 1.841 · 10 −6 3.267 · 10 −4 5.054 · 10 −9 1.630 · 10 −2 24.726 727.501 Moreover, based on Table 5 , we make the following remarks. First, the volatility of volatility is on average smaller than the volatility in the case of both indices. Secondly, the volatility of volatility appears to be more volatile than the volatility itself, as it displays larger sample standard deviations and maxima for both the estimated series. Finally, the volatility of volatility appears to be much more skewed and leptokurtic than the volatility for both the indices. An analysis of the empirical regularities displayed by the reconstructed daily series of the volatility of volatility of the S&P500 and the EUROSTOXX50 is illustrated in the next subsection. The literature on the stylized facts related to the behavior of the volatility of financial assets is very rich (see, for instance, [Andersen et al., 2001] , [Patton and Engle, 2001 ] and [Corsi, 2009] , among many others). These include, e.g., clustering, long memory, mean-reversion, log-normality and leverage effects. Nowadays, the volatility can be regarded, in some sense, as a traded asset itself. In fact, it is possible to "trade" the volatility of many financial asset classes via quoted and O.T.C. volatility derivatives (e.g., variance swaps, VIX futures and VIX options). Therefore, it may be of interest to evaluate which typical features of the daily volatility actually apply to the daily volatility of the volatility itself. We have already observed in the previous subsection that the volatility of volatility shows clusters, being larger in correspondence of crises and smaller and less volatile during periods of economic stability. However, based on the observation of the sample auto-correlation function (see Figures 6 and 7) , it appears to be less persistent than the volatility for both the indices considered. As for the mean-reversion property, the Augmented Dickey-Fuller test rejects the hypothesis of a unit root for both volatility of volatility series analysed, at the 0.01% significance level. We also examine the year-by-year correlation of the daily volatility of volatility with, resp., the daily volatility and the daily log-return, computed as the difference between the closing and opening log-price. The dynamics of such correlations are summarized in Tables 6 and 7 , where the values of return-variance correlations, a rough proxy of the leverage effect, are also displayed for comparison 5 . For both the indices, we observe that the yearly correlation between the log-return and the volatility of volatility tends to be negative and to follow the return-variance correlation closely, although being most often smaller in absolute value. This result may suggest the existence of a "second-order" leverage effect: what we observe is in fact that when the asset price decreases, not only the volatility increases, due the asset becoming riskier, but also the volatility of volatility -which can be seen as a proxy of the uncertainty about the amount of risk perceived by market operators, that is, the "volatility of risk" -becomes larger. The yearly correlation between the volatility of volatility and the volatility is instead positive and close 5 Note that the correlations appearing in Tables 6 and 7 Table 7 : EUROSTOXX50 index: sample yearly correlations of the daily volatility of volatility with the corresponding daily volatility and daily log-return. using the Jarque-Bera and Anderson-Darling tests at the 5% significance level. The years in which both tests reject the null hypothesis of Gaussianity are 2008, 2011, 2016 and 2020, that is, the years in the sample that were the most characterized by market turmoil (in order: the global financial crisis, the Euro-area instability phase, Brexit and the outbreak of the COVID pandemic). This happens for both the quantities tested and both the indices analyzed, thus suggesting that the log-normal approximation for the distribution of the volatility and the volatility of volatility is more satisfactory in periods of market stability. This paper fills a gap in the literature on financial econometrics by deriving the convergence rate of the Fourier estimator of the volatility of volatility. In this regard, we showed that the bias-corrected version of the estimator reaches the optimal rate n 1/4 , while the estimator without bias-correction achieves a sub-optimal rate, but has a smaller asymptotic variance. Further, we presented a numerical study that shows that the rate-optimal Fourier estimator of the volatility of volatility performs well in finite samples, even at the relatively small daily estimation horizon, where the competing rate-efficient realized estimator shows a poor performance. Finally, we applied the Fourier estimator to multi-year samples of S&P500 and EUROSTOXX50 observations and gained some new knowledge about the empirical regularities that characterize the daily dynamics of the volatility of volatility, a topic which so far had been scarcely explored in the literature. The proofs of Theorems 3.1, 3.5, 3.6 and 3.7 are illustrated in the next subsections. Some preliminary remarks are useful. Remark 7.1 As every continuous process is locally bounded, thus all processes appearing are. Moreover, standard localization procedures (see, e.g., [Ait-Sahalia and Jacod, 2014] ) allow to assume that any locally bounded process is actually bounded and almost-surely positive processes can be considered as bounded away from zero. Remark 7.2 In [Malliavin and Mancino, 2009 ], Lemma 2.2, it is proved that the drift component of the semi-martingale model gives no contribution to the convolution formula (2). Therefore, we will refer to the drift-less model in Assumption (A.I). Moreover, as observed in [Malliavin and Mancino, 2002] , we can assume that p(0) = p(2π) and v(0) = v(2π). In fact, if p(0) = p(2π) (similarly for the process v), we introduce Then, while p satisfies the required assumption, at the same time the volatility and co-volatilities estimations are not affected by a modification of the drift as above. From the point of view of the modeling, we may consider In fact, for any k = 0, it holds c k (dv) = c k (d v), while the 0-th Fourier coefficient, c 0 (dv), is not contributing to the definition of the Fourier volatility of volatility estimator. The situation would change in the case when one wishes to estimate the spot volatility. However, this is not an issue of the present study, as the estimation of the spot volatility is not required. Given the discrete time observations {0 = t 0 ≤ . . . ≤ t i ≤ . . . ≤ t n = 2π}, we use the notation in continuous time by letting ϕ n (t) := sup{t j : t j ≤ t}, for the sake of simplicity. From the Itô formula we have the following decomposition of the term (2) c k (v n,N ) := A k,n + B k,n,N + C k,n,N , where It follows that γ 2 n,N,M is equal to the following sum: In the following we denote: and, for brevity, we will also use the following notation for the Dirichlet and the Fejér kernels (see also the Appendix B, Section 8): and similarly for the derivatives of the Fejér kernel 6 F ′ M,n and F ′′ M,n . In order to identify the different contribution of all terms, we start with equation (24), which can be written as 2π We consider now the term (25). An application of the Itô formula shows that it is equal to 2(AB and, letting and Finally, we consider the term (26). Using the fact that the Fejér kernel is an even function, it can be re-written as where, using notation introduced in (27), each term is defined as follows: In summary, the estimation error comprises the study of four main components: +2 (AB where σ 4 n,N,M is defined in (5) and the constant K is determined in (51). Accordingly, the proof of the theorem is divided into four steps. The first and second steps comprise the study of the bias correction term (47) and the asymptotic negligibility of the discretization error (45). The other two steps follow [Jacod, 1997] in order to identify the asymptotic variance and prove the stable convergence in law. In the proof we consider the case of regular sampling, i.e., ϕ n (t) = 2π n j, if 2π n j ≤ t < 2π n (j + 1), j = 0, . . . , n. Further, C will always denote a positive constant, not necessarily the same. Firstly, we show that the term (47) is o p (ρ(n) 1/4 ), therefore proving that the error (43) is equal to +2 (AB We begin by studying the term BB The leading term is the first one, namely (48), which is easily seen to be equal to Now, using Lemma 8.2 and noting that N/n ∼ c N /(2π), we have that (50) converges in probability to (1 + 2η(c N /π)) 1 2π 2π 0 σ 4 (t)dt. Consider now (49). Exploiting the boundedness of the process v, it is enough to observe that where we have used Lemma 8.2 and the property that D 2 N,n (s−s ′ ) ≤ CN −2 for s ′ < s−ε, ε > 0, for n large enough. Therefore, in probability the term (49) has order M 2 ρ(n) 3/2 ∼ c 2 M ρ(n) −1 ρ(n) 3/2 = c 2 M ρ(n) 1/2 , and thus it converges to zero. (1 + 2η(c N /π)) = 1 3 (1 + 2η(c N /π)). Then, following [Livieri et al., 2019] , Theorem 3, the following convergence in probability holds: where X cM are Y cM ,cN are defined as Therefore, the proof that (47) is o p (ρ(n) 1/4 ) is complete. In this paragraph we consider the discretization error component (45), which reads and prove that it is o p (ρ(n) 1/4 ). Up to a negligible multiplicative constant, we rewrite the latter as follows 7 : and E (2) M,n := Note that (see, also, Remark 7.2), the term |c k (v)| 2 reads Further, we compute the k-th Fourier coefficient of F ′′ M,n . First, note that, for any j, one has that Therefore, it holds that Now observe that, if n divides k, then 1 − e −2πi k n is equal to zero; accordingly, we assume that k = nq + r, r = 0 with either q = 0 if |k| ≤ n or q = 0 otherwise. Moreover, note that the summation in (56) is either equal to n, if n divides k − l, or equal to zero, otherwise; hence, we set l = r, with |r| ≤ M. Thus, (56) reduces to We now study the asymptotic behavior of the terms E (1) (2) M,n , separately. First, we prove that E (2) M,n converges to zero in the L 1 -norm. By taking into account the decomposition of |c k (v)| 2 in (55), we have to prove that both terms resulting from such a decomposition are asymptotically negligible. However, here we explicitly compute the upper bound for the first term, which is the leading term. Using With abuse of notation, in (53) we denoted by c k (F ′′ M,n ) the k-th Fourier coefficient of F ′′ M,n , which is the quantity defined as F ′′ M,n (t) := F ′′ M (ϕn(t)). A straightforward but lengthy proof, which is available from the authors, shows that the difference between the two kernels is negligible. . Now, note that: For what concerns E M,n , we also study the convergence in the L 1 -norm. We set k = r. Again, we focus on the leading term from the decomposition in (55). For any fixed k, it holds: Therefore, we have: This completes the proof of the asymptotical negligibility of the statistical error with faster rate than ρ(n) 1/4 . This section follows [Jacod, 1997] in order to identify the asymptotic variance and prove the stable convergence in law. First, consider the term (29), namely: Using the integration by parts formula and Remark 7.2, it holds c k (v) = 1 ik c k (dv) and (58) is equal to By applying the Itô formula, the term (58) (59) 8 To simplify the notation, in the following we will always omit the argument when it is equal to 2π, so we will write A M instead of A M (2π). Similarly, for the processes in (61). Then, according to [Jacod, 1997] , we determine the variance of the asymptotic distribution by studying ρ(n) −1/4 2 V M,n,N , ρ(n) −1/4 2 V M,n,N 2π , In the first step we study the bracket: Noting that M/c M ≃ ρ(n) −1/2 , we write (62) as which, by using the Itô formula, is equal to Using Lemma 8.1, equation (121), it is seen that the term (63) converges in probability to 1 2π Further, in order to prove that the term (64) is o p (1), it is enough to compute and apply Lemma 8.1, which give that (1/M ) The second step for identifying the asymptotic variance is to study ρ(n) −1/2 2(AB The bracket contains 16 terms giving the same contribution for symmetry. We consider the term AB Consider (68). Applying Lemma 8.2, equation (130), and an analogous procedure to that applied in Step II for (52), by using the fact that M/n → 0, it is equivalent to study Applying the integration by parts and the boundedness of the volatility process v, it holds that Therefore (70) gives Consider (72). It is enough to observe that, by Lemma 8.1, it holds: (1 + 2η(c N /π)). Then the term (72) converges to It remains to prove that (73) is asymptotically negligible. To this aim, consider We show now that (69) goes to zero. This result follows by noting that D 2 N,n (s − r) < C N −2 for r < s − ε, for any ε > 0 and n large enough, and the same remark used for (70). Thus the term (66) converges to 16 15 The last contribution to the variance of the asymptotic distribution is obtained by studying the bracket ρ(n) −1/4 2(BB By applying the Itô formula, we have that Therefore, the bracket (76) splits as Consider first (77), which gives the asymptotic variance term. As we show later, (78) goes instead to zero. By applying the Itô formula twice, it holds Using Lemma 8.2, equation (130), the term (79) gives ρ(n) −1/2 1 n 2 (1 + 2η(c N /π)) 2 2π 0 u 0 1 (M + 1) 2 |F ′′ M,n (u − s)| 2 σ 4 (s)ds σ 4 (u) du. Finally, using Lemma 8.1, the term (79) converges to A similar procedure as for (69) allows us to prove that (80), (81) and (82) go to zero in probability. We verify now that the (78) goes to zero in probability. It is enough to show that is o p (ρ(n) 1/2 ). By the Itô formula and Lemma 8.2, it holds that +o p (ρ(n) 1/2 ). Consider (84). By the Cauchy-Schwarz inequality, Again, using that D 2 N,n (s − u ′ ) ≤ C/N 2 for u ′ < s − ε, for ε > 0 and n large enough, then (85) is smaller than Then, using Lemma 8.1, and the fact that (1/M 5 )|F ′′ M,n (x)| 2 is a good kernel, then this term has order N −2 M 6 o(1) = M 2 o(1). Finally, we obtain the order of (83), which is o p (ρ(n) 1/2 ). We can conclude that the contribution of all the terms (75) is It remains to show that the other brackets in (60) give asymptotically negligible contributions. We study in detail the convergence in probability and The proof is analogous for the other terms. The bracket (86) is equal to Omitting negligible constants, and using the result obtained in Step II for the term (52), by virtue of the fact that M/n → 0, then (88) is equal to Moreover, by (71), we are lead to study Applying the Itô formula, (89) is equal to Consider (90). It is equal to the sum We study (93). The term (94) is analogous. By the boundedness of the volatility and the volatility of volatility processes, it is enough to prove that The term (93) is smaller than Noting that D 2 N,n (t − u) < C/N 2 for u < t − ε, for ε > 0, and n large enough, the previous term is smaller Using Lemmas 8.1, this last term is C M −1 n −1 M 1/2 M 3/2 = O(n −1/2 ). Consider now (91). The term (92) is analogous. It is enough to show that Therefore, using the Cauchy-Schwarz inequality, we consider Using the fact that E[(Y n,N (t, t)) 2 ] 1/2 = O(n −1/2 ), the Itô isometry and the boundedness of volatility of volatility, we have We study (95) . The other term is analogous. By the boundedness of the volatility process and applying the Cauchy-Schwarz inequality, it is enough to study Using the Burholder-Davis-Gundy inequality and Lemma 8.2, it holds E[Y 4 n,N (s, s)] 1/2 ≤ Cρ(n). Then we compute: r 0 D N,n (u − u 1 )σ(u 1 )dW u1 D N,n (u − r)σ(r)dW r v(u)du] 1/2 . By Lemma 8.2, it is enough to study: Using Lemma 8.2 and equation (71), the leading term of (98) gives The final step requires to prove that in probability, as n, N, M → ∞, ρ(n) −1/4 2 V M,n,N , W 2π → 0. We provide a detailed proof for the convergence to 0 in probability of the bracket ρ(n) −1/4 BB The convergence of the other terms can be shown with an analogous procedure. Consider (99) It is enough to consider the terms in (102) and (104). In fact, the terms (103) and (105) Using Lemma 8.2 and the fact that D 2 N,n (u − t) ≤ C N 2 for t < u − ε and n large enough, we see that this term has order N −2 ρ(n). Coming back to (108), it has order ρ(n) 1/2 M 3/2 M 3/2 N −1 ρ(n) 1/2 o(1) = o(ρ(n) 1/2 ) → 0. Now consider the case u − ε ≤ u ′ ≤ u, for any ε > 0. Starting from equation (108) is ρ(n) 1/2 M 3/2 M 3/2 ρ(n) 1/2 ρ(n) 1/2 = O(1). Thus we have proved that (112) is smaller than C ε, for any ε > 0. The proof of Theorem 3.1 is now completed. In the following, for any random function β, we denote as c 0 (β) the 0-th Fourier coefficient of β, that is, Estimation of the continuous and discontinuous leverage effects The leverage effect puzzle: disentangling sources of bias at high frequency High-Frequency financial econometrics A Hausman test for the presence of market microstructure noise in high frequency data The distribution of realized stock return volatility Stochastic volatility of volatility in continuous time Stochastic volatility of volatility and variance risk premia Multivariate realised kernels: consistent positive semi-definite estimators of the covariation of equity prices with noise and non-synchronous trading Computation of volatility in stochastic volatility models with high frequency data Expected stock returns and variance risk premia A simple approximate long-memory model of realized volatility Limit theorems in the Fourier transform method for the estimation of multivariate volatility Fourier transform methods for pathwise covariance estimation in the presence of jumps A closed-form solution for options with stochastic volatility with applications to bond and currency options On continuous conditional Gaussian martingales and stable convergence in law On the estimation of integrated volatility with jumps and microstructure noise Nonparametric estimation of the leverage effect: a trade-off between robustness and efficiency Jumps in financial markets: a new nonparametric test and jump dynamics The Review of Financial Studies Volatility of volatility: estimation and tests based on noisy high frequency data with jumps Asymptotic results for the Fourier estimator of the integrated quarticity Fourier series method for measurement of multivariate volatilities A Fourier transform method for nonparametric estimation of multivariate volatility Estimation of quarticity with highfrequency data Fourier-Malliavin volatility estimation: theory and practice Rate-efficient asymptotyc normality for the Fourier estimator of the leverage process Inference for continuous semimartingales observed at high frequency What is a good volatility model? High frequency volatility of volatility estimation free from spot volatility estimates Fourier analysis: an introduction Bias-optimal vol-of-vol estimation: the role of window overlapping Estimation of integrated volatility of volatility with applications to goodness-of-fit testing The estimation of the leverage effect with high-frequency data according to definitions (11) and (14). Then, we plug (114) into (113) and, using the product formula for the Fourier coefficients (see [Livieri et al., 2019] ), we observe the following convergences in probability: Then, the convergence in (113) is ensured. We omit the proof for the convergence of V The theorem follows straightforwardly from the stable convergence in Theorem 3.1 and Proposition 3.4. The proof relies on the basic decomposition as in Section 7.1. First of all, we prove that the bias correction is not needed becauseWe study the term BB where the order of the martingale part is obtained in Section 7.2. Now, using Lemma 8.2 ii) and noting that N/n ≈ c N and M/n ι ≈ c M , we have that (116) has order, in probability, equal to n 2ι 1 n .It is then enough to observe that 2ι − 1 + ι/2 < 0, as soon as ι < 1/5, as in the assumption. Thus (115) is proved.The slower rate of M ensures that the discretization error is still negligible. As for the asymptotic variance, the only term which remains is the bracketNoting that M/c M ≃ ρ(n) −ι , we obtain that its limit in probability is 1 2π7.9 Proof of Theorem 3.7The theorem follows straightforwardly from the stable convergence in Theorem 3.6 and the convergence in probability of Γ n,N,M,L to the asymptotic variance. The latter is immediately deduced from the following (14), converges in probability to c k (γ 2 ), in virtue of the proof of Theorem 3.6 and Remark 3.3. Secondly, the product formula is applied. This section resumes some results about the rescaled Dirichlet kernel, defined asand the Fejér kernel, defined asIn the following we consider a regular partition of the time interval, still maintaining the continuous time notation used in the main proof of the TLCs.Lemma 8.1 Suppose that M 2 /n → a, as n, M → ∞, for some constant a > 0. Then, it holds that:In addition, letare families of good kernels. 9Proof The results in (120) and (121) are proved in [Cuchiero and Teichmann, 2015] , Lemma 5.1. Regarding the first equality in (122) and (123), it is sufficient to consider the Euler-MacLaurin formula applied to the squared first and second derivative of the Fejér kernel. For the sake of completeness, remind thatfor a function f : [−π, π] → R of class C 2p+1 it holds that where B 2k is the (2k)-th Bernoulli number and the rest R p,n,f satisfieswith C p a constant depending only on p. In particular, let us consider positive integers k, h; then, we have thatBy observing that the number of terms in the summation is (2M + 1) h , that |j 1 + . . . + j h | ≤ hM , The assertion in (122) follows directly from the following calculationSimilarly, one obtains thatIt remains to prove thatare families of good kernels. First, consider K M . We observe that K M (x) ≥ 0. Then, by using the previous computation, it is easy to show that 1 2πFinally, by using the explicit expressions in terms of sine and cosine and the fact that | sin((M + 1)x/2)| ≤ C(M + 1)|x| and | sin(x/2)| ≥ c|x| for |x| ≤ π, with C, c > 0 suitable constants, we have thatwhich goes to 0 as M → ∞.Analogously, we prove that {L M (x)} ∞ M=1 is a family of good kernels. First, note that L M (x) ≥ 0 and 1 2π π −π L M (x) dx = 1. Finally, we have that(for A, B suitable constants), which converges to zero as M → ∞. ii) It holds that lim n,N n x 0 D 2 N (ϕ n (x) − ϕ n (y))dy = π(1 + 2η (2a)) and, for any α-Hölder continuous function f , with α ∈ (0, 1], lim N,n n x 0 D 2 N (ϕ n (x) − ϕ n (y))f (y)dy = π(1 + 2η (2a)where η(a) := 1 2a 2 r(a)(1 − r(a)),being r(a) = a − [a], with [a] the integer part of a.iii) for any ε > 0, lim N,n n x−ε 0 |D N (ϕ n (x) − ϕ n (y))| 2 dy = 0.Proof. See [Clement and Gloter, 2011] , Lemma 1 and Lemma 4.