key: cord-0208653-ew2rodh1
authors: Wu, Kejin; Karmakar, Sayar
title: A New Model-free Prediction Method: GA-NoVaS
date: 2021-12-16
journal: nan
DOI: nan
sha: ac6c97890ae423ff6aaf5788d8742be729d0f829
doc_id: 208653
cord_uid: ew2rodh1

Volatility forecasting plays an important role in the financial econometrics. Previous works in this regime are mainly based on applying various GARCH-type models. However, it is hard for people to choose a specific GARCH model which works for general cases and such traditional methods are unstable for dealing with high-volatile period or using small sample size. The newly proposed normalizing and variance stabilizing (NoVaS) method is a more robust and accurate prediction technique. This Model-free method is built by taking advantage of an inverse transformation which is based on the ARCH model. Inspired by the historic development of the ARCH to GARCH model, we propose a novel NoVaS-type method which exploits the GARCH model structure. By performing extensive data analysis, we find our model has better time-aggregated prediction performance than the current state-of-the-art NoVaS method on forecasting short and volatile data. The victory of our new method corroborates that and also opens up avenues where one can explore other NoVaS structures to improve on the existing ones or solve specific prediction problems.

In the area of financial econometrics, forecasting volatility accurately and robustly is always important (Engle and Patton 2001; Du and Budescu 2007) . Achieving high-quality volatility forecasting is crucial for practitioners and traders to make decisions in risk management, asset allocation, pricing of derivative instrument and strategic decisions of fiscal policies (Fang et al. 2018; Ashiya 2003; Bansal et al. 2016; Kitsul and Wright 2013; Morikawa 2019) . One example is that the global financial crisis of 2008 was severer than the predicted average recession by the US Congressional Budget Office. If people had a chance to perform a more accurate volatility prediction, this well-known crisis might not make huge negative impact on the global economics. A most recent case is that the COVID-19 pandemic. Some studies already warned the volatility of global financial market brought by the pandemic is even comparable with the financial crisis of 2008 (Abodunrin et al. 2020; Fernandes 2020; Yueh 2020) . Nobody knows when the next global uncertainty will come, thus a guideline for people to prepare for the latent financial recession is desired. Although we believe accurate economic predictions may derive people to make right and timely decisions which can potentially reduce effects created by financial recessions, volatility forecasting faces challenges of some facts, such as small sample size, heteroscedasticity and structural change (Chudỳ et al. 2020) .

Standard methods to do volatility forecasting are typically built upon GARCHtype models, these models' abilities to forecast the absolute magnitude and quantiles or entire density of squared financial log-returns (i.e., equivalent to volatility forecasting to some extent) 1 were shown by Engle and Patton (2001) using Dow Jones Industrial Index. Later, many studies about comparing the performance of different GARCH-type models on predicting volatility of financial series were conducted (Chortareas et al. 2011; González-Rivera et al. 2004; Herrera et al. 2018; Lim and Sek 2013; Peters 2001; Wilhelmsson 2006; Zheng 2012) . Some researchers also tried to develop the GARCH model further, such as adopting smoothing parameters or adding more related information to estimate model (Breitung and Hafner 2016; Chen et al. 2012; Fiszeder and Perczak 2016; Taylor 2004) . For modelling the proper process of volatility during fluctuated period, researchers also developed different approaches to achieve this goal, such as Kim et al. (2011) applied time series models with stable and tempered-stable innovations to measure market risk during highly volatile period; Ben Nasr et al. (2014) applied a fractionally integrated time-varying GARCH (FITVGARCH) model to fit volatility; Karmakar and Roy (2021) developed a Bayesian method to estimate time-varying analogues of ARCHtype models for describing frequent volatility changes. Although there are many different types of GARCH models, it is a difficult problem to determine which single GARCH model outperforms others uniformly since the performance of these models heavily depends on the error distribution, the length of prediction horizon and the property of datasets.

For getting out of this dilemma, we want to introduce a new idea to forecast volatility of financial series. Recall that during the prediction process, an inescapable intermediate process is building a model to describe historical data.

Consequently, prediction results are restricted to this specific model used. However, the used model may not be correct within data, and the wrong model can even give better predictions sometimes (Politis 2015) . Therefore, it is hard as a practitioner to determine which model should be used in this intermediate stage. With the Model-free Prediction Principle being first proposed by Politis (2003) , people can do predictions without the restriction of building a specific model and put all emphasis on data itself. Subsequently, we acquire a powerful method-NoVaS method-to do forecasting.

The NoVaS method is one kind of Model-free method and applies the normalizing and variance-stabilizing transformation (NoVaS transformation) to do predictions (Politis 2003 (Politis , 2007 (Politis , 2015 . Some previous studies have shown that the NoVaS method possesses better predicting performance than GARCH-type models on forecasting squared log-returns, such as Gulay and Emec (2018) showed that the NoVaS method could beat GARCH-type models (GARCH, EGARCH and GJR-GARCH) with generalized error distributions by comparing the pseudoout of sample 2 (POOS) forecasting performance. Chen and Politis (2020) showed the "Time-varying" NoVaS method is robust against possible non-stationarities in the data. Furthermore, Chen and Politis (2019) extended this NoVaS approach to do multi-step ahead predictions of squared log-returns. They also verified this approach could avoid error accumulation problem for single 5-steps ahead predictions and outperform the standard GARCH model for most of time. Wu and Karmakar (2021) further substantiated the great performance of NoVaS methods on longterm (30-steps ahead) predictions. They considered taking the aggregated h-steps ahead prediction, i.e., the sliding-window mean of h-steps ahead predictions, to represent the long-term prediction. In the practical aspect of forecasting volatility, a single-point prediction may stand trivial meaning, since a small predicted volatility value at specific time point t + h can not guarantee small values at other future time points. Conversely, this great-looking single prediction may mislead traders. Thus, a long-term prediction in the econometric area is meant to obtain some inferences about the future situation at overall level. In this article, we keep using the time-aggregated prediction value to measure different methods' shortlong-term forecasting performance. This aggregated metric also had been applied to depict the future situation of electricity price or financial data (Chudỳ et al. 2020; Karmakar et al. 2020; Fryzlewicz et al. 2008) .

To the best of our knowledge, previous studies in this regime mainly focus on comparing the NoVaS method with different GARCH-type models, and omit the potential of developing the NoVaS method itself further. Inspired by the development of ARCH model (Engle 1982) to GARCH model (Bollerslev 1986) , this article attempts to build a novel NoVaS method which is derived from the GARCH(1,1) structure. Through extensive data analysis, we show that our methods bring significant improvement when the available data is short in length or is of more volatile nature. These are usually challenging forecasting exercise and thus our new method can open up some new directions.

The rest of this article is organized as follows. Details about the existing No-VaS method and motivations to propose a new method are explained in Section 2. In Section 3, we propose a new NoVaS transformation approach, then a named GA-NoVaS method is created. According to the procedure of proposing the refined GE-NoVaS-without-β method in (Wu and Karmakar 2021) , we also put forward a parsimonious variant of the GA-NoVaS method. Moreover, we connect this new parsimonious method with the GE-NoVaS-without-β method. For comparing our new methods with the current state-of-the-art NoVaS method, POOS predictions on different simulated datasets are performed in Section 4. Additionally, in Section 5, we deploy extensive data analysis using various types of real-world datasets. In Section 6, we exhibit some statistical test results to manifest the reasonability of our new methods. Finally, future work and discussions are presented in Section 7.

In this section, details about the NoVaS method are given. We first introduce the Model-free Prediction Principle which is the core idea behind the NoVaS method. Then, we present how the NoVaS transformation can be built from an ARCH model. Finally, we give the motivation to build a new NoVaS transformation.

Before presenting the NoVaS method in detail, we throw some light on the insight of the Model-free Prediction Principle. The main idea behind this is applying an invertible transformation function Hn which can map a non-i.i.d. vector {Y i ; i = 1, · · · , n} to a vector {ǫ i ; i = 1, · · · , n} that has i.i.d. components. Since the prediction of i.i.d. data is somewhat standard, the prediction of Y n+1 can be easily obtained by inversely transformingǫ n+1 which is a prediction of ǫ n+1 using H −1 n . For example, we can express the predictionŶ n+1 as a function of Yn, X n+1 andǫ n+1 :Ŷ

Where, Yn denotes all historical data {Y t ; i = 1, · · · , n}; X n+1 is the collection of all predictors and it also contains the value of future predictor X n+1 . Although a qualified transformation Hn is hard to find for general case, we have naturally existing forms of Hn in some situations, such as in linear time series environment. In this article, we will show how to utilize existing forms-ARCH and GARCH models-to build NoVaS transformations to predict volatility. One thing should be noticed is that a more complicated procedure to obtain transformation Hn is needed (see (Politis 2015) for more details) if we do not have these existing forms for some situations. After acquiring Eq. (2.1), we can even predict the general form of Y n+1 such as g(Y n+1 ). Politis (2015) defined two data-based optimal predictors of g(Y n+1 ) under L 1 (Mean Absolute Deviation) and L 2 (Mean Squared Error) criterions respectively as below:

In Eq. (2.2), {ǫ n+1,m } M m=1 are generated from its own distribution by bootstrapping or from a normal distribution through Monte Carlo method (recall the {ǫ i ; i = 1, · · · , n} are i.i.d.); M takes a large number (5000 in this article).

The NoVaS transformation is a straightforward application of the Model-free Prediction Principle. More specifically, the NoVaS transformation is a qualified function Hn which is based on the ARCH model introduced by Engle (1982) as follows:

In Eq. (2.3), these parameters satisfy a ≥ 0, a i ≥ 0, for all i = 1, · · · , p; W t ∼ i.i.d. N (0, 1). In other words, the structure of the ARCH model gives us a readymade Hn. We can express W t in Eq. (2.3) using other terms:

(2.4) Subsequently, Eq. (2.4) can be seen as a potential form of Hn. However, some additional adjustments were made by Politis (2003) . Firstly, Y t was added into the denominator on the right hand side of Eq. (2.4) for obeying the rule of causal estimate which means only using present and past data, and utilizing as much information as possible. Secondly, the constant a was replaced by αs 2 t−1 to create a scale invariant parameter α. Consequently, after observing {Y 1 , · · · , Yn}, the adjusted Eq. (2.4) can be written as Eq. (2.5):

; for t = p + 1, · · · , n.

(2.5)

In Eq. (2.5), {Y t ; t = 1, · · · , n} is the target data, such as financial log-returns in this paper; {W t ; t = p + 1, · · · , n} is the transformed vector; α is a fixed scale invariant constant; s 2 t−1 is an estimator of the variance of {Y i ; i = 1, · · · , t − 1} and can be calculated by (

{W t ; t = p + 1, · · · , n} expressed in Eq. (2.5) are assumed to be i.i.d. N (0, 1), but it is not true yet. For making Eq. (2.5) be a qualified function Hn (i.e., making {W t } n t=p+1 really obey standard normal distribution), we still need to add some restrictions on α and β, a 1 , · · · , ap. Recalling the NoVaS transformation means normalizing and variance stabilizing transformation, we first stabilize the variance by requiring:

(2.6) By requiring unknown parameters in Eq. (2.5) to satisfy Eq. (2.6), we can make {W t } n t=p+1 series possess approximate unit variance. Additionally, α and β, a 1 , · · · , ap must be selected carefully. In the work of Politis (2015) , two different structures of β, a 1 , · · · , ap were provided:

(2.7)

We keep using S-NoVaS and E-NoVaS to denote these two NoVaS methods in Eq. (2.7). For the S-NoVaS, all β, a 1 , · · · , ap taking same value means we assign same weights on past data. Similarly, for the E-NoVaS, β, a 1 , · · · , ap are exponentially positive decayed coefficients, which means we assign decayed weights on the past data. Note that α is equal to 0 in both methods above. If we allow α takes a positive small value, we can get two different methods:

(2.8)

We keep using GS-NoVaS and GE-NoVaS to denote these two generalized NoVaS methods in Eq. (2.8). α in both generalized methods takes value from (0, 1) 3 . Obviously, NoVaS and generalized NoVaS methods all meet the requirement of Eq. (2.6). Finally, we still need to make {W t } n t=p+1 independent. In practice, {W t } n t=p+1 transformed from financial log-returns by NoVaS transformation are usually uncorrelated 4 . Therefore, if we make the empirical distribution of {W t } n t=p+1 close to the standard normal distribution (i.e., normalizing {W t } n t=p+1 ), we can get a i.i.d. series {W t } n t=p+1 . Note that the distribution of financial log-returns is usually symmetric, thus, the kurtosis can serve as a simple distance measuring the departure of a non-skewed dataset from the standard normal distribution whose kurtosis is 3 (Politis 2015) . We useFw to denote the empirical distribution of {W t } n t=p+1 and use KU RT (W t ) to denote the kurtosis ofFw. Then, for makingFw close to the standard normal distribution, we attempt to minimize |KU RT (W t ) − 3| 5 to obtain the optimal combination of α, β, a 1 , · · · , ap. Consequently, the NoVaS transformation is determined.

From the thesis of Chen (2018) , Generalized NoVaS methods are better for interval prediction and estimation of squared log-returns than other NoVaS methods. The GE-NoVaS method which assigns exponentially decayed weight to the past data is also more reasonable than the GS-NoVaS method which handles past data equally. Therefore, in this article, we verify the advantage of our new methods 3 If α = 1, all a i will equal to 0. It means we just standardize {Yt} and do not utilize the structure of ARCH model. 4 If {Wt} n t=p+1 is correlated, some additional adjustments need to be done, more details can be found in (Politis 2015) .

5 More details about this minimizing process can be found in Politis (2015) .

by comparing them with the GE-NoVaS method. Before going further to propose the new NoVaS transformation, we talk more details about the GE-NoVaS method and our motivations to create new methods.

For the GE-NoVaS method, the fixed α is larger than 0 and selected from a grid of possible values based on predictions performance. In this article, we define this grid as (0.1, 0.2, · · · , 0.8) which contains 8 discrete values 6 . From Section 2.2, based on the guide of Model-free Prediction Principle, we already get the function Hn of GE-NoVaS method by requiring parameters to satisfy Eq. (2.6) and minimizing |KU RT (W t ) − 3| . For completing the Model-free prediction process, we still need to figure out the form of H −1 n . Through Eq. (2.5), H −1 n can be written as follows:

Based on Eq. (2.9), it is easy to get the analytical form of Y n+1 , which can be expressed as below:

In Eq. (2.10), s 2 n is an estimator of the variance of {Y i ; i = 1, · · · , n} and can be calculated by n −1 n i=1 (Y i − µ) 2 , µ is the mean of data; W n+1 will be replaced by a sample from the empirical distributionFw or the trimmed standard normal distribution 7 . Based on aforementioned L 1 and L 2 optimal predictor in Eq. (2.2), we can define L 1 and L 2 optimal predictors of Y 2 n+1 after observing historical 6 It is possible to refine this grid to get a better transformation. However, computation burden will also increase. 7 The reason of utilizing a trimmed standard normal distribution is transformed {Wt} n t=p+1 are between −1/ √ β and 1/ √ β from Eq. (2.5).

information set Fn = {Y t , 1 ≤ t ≤ n} as below:

where, {W n+1,m } M m=1 are bootstrapped M times from its empirical distribution or generated from a trimmed standard normal distribution by Monte Carlo method. In other words, Y n+1 can be presented as a function of W n+1 and Fn as below:

For reminding us this relationship between Y n+1 and W n+1 , Y 1 , · · · , Yn is derived from the GE-NoVaS method, we use f GE (·) to denote this function. It is not hard to find Y n+2 can be expressed as:

For deriving the optimal predictor of Y 2 n+2 we can generate {W n+1,m , W n+2,m } M m=1 M times to compute the L 1 and L 2 optimal predictors of Y 2 n+1 firstly. Then, with predictedŶ 2 n+1 and {W n+2,m } M m=1 , L 1 and L 2 optimal predictors of Y 2 n+2 can be obtained similarly with Eq. (2.13).

Finally, iterating the process to get predictedŶ 2 n+2 , we can accomplish the multi-step ahead prediction of Y 2 n+h for any h ≥ 3. Basically speaking, {W n+1,m , · · · , W n+h,m } M m=1 are needed. Then, we iteratively predict Y 2 n+1 , Y 2 n+2 , Y 2 n+3 and so on. In summary, we can express Y n+h as:

Y n+h = f GE (W n+1 , · · · , W n+h ; Fn) ; for any h ≥ 1.

(2.14)

Note that, the analytical form of Y n+h from the GE-NoVaS transformation, only depends on i.i.d. {W n+1 , · · · , W n+h } and Fn.

Structured form of coefficients: In last few subsections, we illustrated the procedure of using the GE-NoVaS method to calculate L 1 and L 2 predictors of Y n+h . However, the form of coefficients β, a 1 , · · · , ap in Eq. (2.5) is somewhat arbitrary. Note that the GE-NoVaS method simply sets β, a 1 , · · · , ap to be exponentially decayed. This lets us put forward the following idea: Can we build a more rigorous form of β, a 1 , · · · , ap based on the relevant model itself without assigning any prior form on coefficients? In this paper, a new approach to explore the form of β, a 1 , · · · , ap based on the GARCH(1,1) model is proposed. Subsequently, the GARCH-NoVaS (GA-NoVaS) transformation is built.

Changing the NoVaS transformation: The authors in Chen and Politis (2019) claimed that NoVaS method can generally avoid the error accumulation problem which is derived from the traditional multi-stage prediction, i.e., using predicted values to predict further data iteratively. However, Wu and Karmakar (2021) showed the current state-of-the-art GE-NoVaS method still renders extremely large timeaggregated multi-step ahead predictions under L 2 risk measure sometimes. The reason for this phenomenon is the denominator of Eq. (2.10) will be quite small when the generated W * is very close to 1/β. In this situation, the prediction error will be amplified. Moreover, when the long-step ahead prediction is desired, this amplification will be accumulated and the final prediction will be ruined. Thus, a β-removing technique was imposed on the GE-NoVaS method to get a GE-NoVaS-without-β method 8 . Similarly, we can also obtain a parsimonious variant of the GA-NoVaS method by reusing this technique. The discussion of these parsimonious methods is presented in Sections 3.2 and 3.3.

In this section, we first propose the GA-NoVaS method which is directly developed from the GARCH(1,1) model without assigning any specific form of β, a 1 , · · · , ap. Then, the GA-NoVaS-without-β method is introduced through applying the βremoving technique again. We also provide algorithms of these two new methods in the end.

Recall the GE-NoVaS method mentioned in Section 2.3, it was built by taking advantage of the ARCH(p) model, p takes an initially large value in the algorithm of the GE-NoVaS method. Although the ARCH model is the base of the GE-NoVaS method, we should notice that free parameters of the GE-NoVaS method are just c and α. For representing p coefficients by just two free parameters, some specific forms are assigned to β, a 1 , · · · , ap. However, we doubt the reasonability of this approach. Thus, we try to use a more convincing approach to find β, a 1 , · · · , ap directly without assigning any prior form to these parameters. We call this NoVaS transformation method by GA-NoVaS.

The idea behind this new method is inspired by the successful historic development of the ARCH to GARCH model. The popular GARCH(1,1) model can be utilized to represent the ARCH(∞) model, the proof is trivial, see (Politis and McElroy 2019) for some references. If we want to build a GE-NoVaS transformation with p converging to ∞, it is appropriate to replace the denominator at the right hand side of Eq. (2.4) by the structure of GARCH(1,1) model. We present the GARCH(1,1) model as Eq. (3.1):

In other words, a potentially qualified transformation related to the GARCH(1,1) or ARCH(∞) model can be exhibited as:

However, recall the core insight of the NoVaS method is connecting the original data with the transformed data by a qualified transformation function. A primary problem is desired to be solved is that the right-hand side of Eq. (3.2) contains other terms rather than only {Y t } terms. Thus, more manipulations are required to build the GA-NoVaS method. Taking Eq. (3.1) as the starting point, we first find out expressions of σ 2 t−1 , σ 2 t−2 , · · · as follow:

Plug all components in Eq. (3.3) into Eq. (3.1), one equation sequence can be gotten:

Iterating the process in Eq. (3.4), with the requirement of a 1 + b 1 < 1 for the stationarity, the limiting form of Y t can be written as Eq. (3.5):

We can rewrite Eq. (3.5) to get a potential function Hn which is corresponding to the GA-NoVaS method:

Recall the adjustment taken in the existing GE-NoVaS method, the total difference between Eqs. (2.4) and (2.5) can be seen as the term a being replaced by αs 2 t−1 + βY 2 t . Apply this same adjustment on Eq. (3.6), then this equation will be changed to the form as follows:

is also required to take a small positive value, this term can be seen as aα (α ≥ 0) which is equivalent with α in the existing GE-NoVaS method. Thus, we can simplify αs 2 t−1 /(1 − b 1 ) toαs 2 t−1 . For keeping the same notation style with the GE-NoVaS method, we use αs 2 t−1 to represent αs 2

can be represented as:

For getting a qualified GA-NoVaS transformation, we still need to make the transformation function Eq. (3.8) satisfy the requirement of the Model-free Prediction Principle. Recall that in the existing GE-NoVaS method, α + β + p i=1 a i in Eq. (2.5) is restricted to be 1 for meeting the requirement of variance-stabilizing and the optimal combination of α, β, a 1 , · · · , ap is selected to make the empirical distribution of {W t } as close to the standard normal distribution as possible (i.e., minimizing |KU RT (W t ) − 3|). Similarly, for getting a qualified Hn from Eq. (3.8), we require:

Under this requirement, since a 1 and b 1 are both less than 1,

, where q takes a large value. Then a truncated form of Eq. (3.8) can be written as Eq. (3.10):

; for t = q + 1, · · · , n.

(3.10)

Now, we take Eq. (3.10) as a potential function Hn. Then, the requirement of variance-stabilizing is changed to:

, and then search optimal coefficients. For presenting Eq. (3.10) with scaling coefficients in a concise form, we use {c 0 , c 1 , · · · , cq} to rep-

} after scaling, which implies that we can rewrite Eq. (3.10) as:

(3.12)

Remark 3.1 (The difference between GA-NoVaS and GE-NoVaS methods) Compared with the existing GE-NoVaS method, we should notice that the GA-NoVaS method possesses a totally different transformation structure. Recall all coefficients except α implied by the GE-NoVaS method are expressed as

There are only two free parameters c and α. However, there are four free parameters β, a 1 , b 1 and α in Eq. (3.10). For example, the coefficient of

. On the other hand, the corresponding coefficient in the GA-NoVaS structure is

We can think the freedom of coefficients within the GA-NoVaS is larger than the freedom in the GE-NoVaS. At the same time, the structure of GA-NoVaS method is built from GARCH(1,1) model directly without imposing any prior assumption on coefficients. We believe this is the reason why our GA-NoVaS method shows better prediction performance in Sections 4 and 5.

Furthermore, for achieving the aim of normalizing, we still fix α to be one specific value from {0.1, 0.2, · · · , 0.8}, and then search the optimal combination of β, a 1 , b 1 from three grids of possible values of β, a 1 , b 1 to minimize |KU RT (W t ) − 3|. After getting a qualified Hn, H −1 n will be outlined immediately:

Based on Eq. (3.13), Y n+1 can be expressed as the equation follows:

Also, it is not hard to express Y n+h as a function of W n+1 , · · · , W n+h and Fn with GA-NoVaS method like we did in Section 2.3:

Once the expression of Y n+h is figured out, we can apply the same procedure with the GE-NoVaS method to get the optimal predictor of Y n+h under L 1 or L 2 risk criterion. To deal with α, we still adopt the same strategy used in the GE-NoVaS method, i.e., select the optimal α from a grid of possible values based on prediction performance. One thing should be noticed is that the value of α is invariant during the process of optimization once we fix it as a specific value. More details about the algorithm of this new method can be found in Section 3.4.

According to the β-removing idea, we can continue proposing the GA-NoVaSwithout-β method which is a parsimonious variant of the GA-NoVaS method.

From Wu and Karmakar (2021) , functions Hn and H −1 n corresponding to the GE-NoVaS-without-β method can be presented as follow:

(3.16) Eq. (3.16) still need to satisfy the requirement of normalizing and variance-stabilizing transformation. Therefore, we restrict α + p i=1 a i = 1 and still select the optimal combination of a 1 , · · · , ap by minimizing |KU RT (W t ) − 3|. Then, Y n+1 can be expressed by Eq. (3.17):

Remark 3.2 Even though we do not include the effect of Y t when we build Hn, the expression of Y n+1 still contains the current value Yn. It means the GE-NoVaSwithout-β method does not disobey the rule of causal prediction.

Similarly, our proposed GA-NoVaS method can also be offered in a variant without β term. Eqs. (3.12) and (3.13) without β term can be represented by following equations:

One thing should be mentioned here is that {c 1 , · · · ,cq} represents {a 1 , a 1 b 1 , a 1 b 2 1 , · · · , a 1 b q−1 1 } scaled by timing a scalar 1−α q j=1 a1b j−1 1 . Besides, α + q i=1c i = 1 is required to satisfy the variance-stabilizing requirement and the optimal combination of a 1 , b 1 is selected by minimizing |KU RT (W t ) − 3| to satisfy the normalizing requirement. For GE-NoVaS-and GA-NoVaS-without-β methods, we can still express Y n+h as a function of {W n+1 , · · · , W n+h } and repeat the aforementioned procedure to get L 1 and L 2 predictors. For example, we can derive the expression of Y n+h using the GA-NoVaS-without-β method:

Y n+h = f GA-without-β (W n+1 , · · · , W n+h ; Fn) ; for any h ≥ 1.

(3.19) Remark 3.3 (Slight computational efficiency of removing β) Note that the suggestion of removing β can also lead a less time-complexity of the existing GE-NoVaS and newly proposed GA-NoVaS methods. The reason for this is simple: Recall 1/ √ β is required to be larger or equal to 3 for making {W t } have enough large range, i.e., β is required to be less or equal to 0.111. However, the optimal combination of NoVaS coefficients may not render a suitable β. For this situation, we need to increase the time-series order (p or q) and repeat the normalizing and variancestabilizing process till β in the optimal combination of coefficients is appropriate. This replication process definitely increases the computation workload.

In this subsection, we reveal that GE-NoVaS-without-β and GA-NoVaS-without-β methods actually have a same structure. The difference between these two methods lies in the region of free parameters. For observing this phenomenon, let us consider scaled coefficients of GA-NoVaS-without-β method except α:

(3.20)

Recall parameters of GE-NoVaS-without-β method except α implied by Eq. (2.8) are:

(3.21)

Observing above two equations, although we can discover that Eq. (3.20) and Eq. (3.21) are equivalent if we set b 1 being equal to e −c , these two methods are still slightly different since regions of b 1 and c play a role in the process of optimization. The complete region of c could be (0, ∞). However, Politis (2015) pointed out that c can not take a large value 9 and the region of c should be an interval of the type (0, m) for some m. In other words, a formidable search problem for finding the optimal c is avoided by choosing such trimmed interval. On the other hand, b 1 is explicitly searched from (0, 1) which is corresponding with c taking values from (0, ∞). Likewise, applying the GA-NoVaS-without-β method, the aforementioned burdensome search problem is also eliminated. Moreover, we can build a transformation based on the whole available region of unknown parameter. In spite of the fact that GE-NoVaS-without-β and GA-NoVaS-without-β methods have indistinguishable prediction performance for most of data analysis cases, we argue that the GA-NoVaS-without-β method is more stable and reasonable than the GE-NoVaS-without-β method since it is a more complete technique viewing the available region of parameter. Moreover, GA-NoVaS-without-β method achieves significantly better prediction performance for some cases, see more details from Appendix A.

In Sections 3.1 and 3.2, we exhibit the GA-NoVaS method and its parsimonious variant. In this section, we provide algorithms of these two methods. For the GA-NoVaS method, unknown parameters β, a 1 , b 1 are selected from three grids of possible values to normalize {W t ; t = q + 1, · · · , n} in Eq. (3.10). If our goal is the h-step ahead prediction of g(Y n+h ) using past {Y t ; t = 1, · · · , n}, the algorithm of the GA-NoVaS method can be summarized in Algorithm 1.

If we want to apply the GA-NoVaS-without-β method, we just need to change Algorithm 1 a little bit. The difference between Algorithms 1 and 2 is the optimization of β term being removed. The optimal combination of a 1 , b 1 is still Algorithm 1: the h-step ahead prediction for the GA-NoVaS method

Step 1

Define a grid of possible α values, {α k ; k = 1, · · · , K}, three grids of possible β, a 1 , b 1 values. Fix α = α k , then calculate the optimal combination of β, a 1 , b 1 of the GA-NoVaS method.

Step 2

Derive the analytic form of Eq. (3.15) using {β, a 1 , b 1 , α k } from the first step.

Step 3

Generate

Step 4

Calculate the optimal predictor g(Ŷ n+h ) by taking the sample mean (under L 2 risk criterion) or sample median (under L 1 risk criterion) of the set {g(Y n+h,1 ), · · · , g(Y n+h,M )}.

Step 5 Repeat above steps with different α values from {α k ; k = 1, · · · , K} to get K prediction results.

selected based on the normalizing and variance-stabilizing purpose. In our experiment setting, we choose regions of β, a 1 , b 1 being (0, 1) and set a 0.02 grid interval to find all parameters. Besides, for the GA-NoVaS method, we also make sure that the sum of β, a 1 , b 1 is less than 1 and the coefficient of Y 2 t is the largest one. Algorithm 2: the h-step ahead prediction for the GA-NoVaS-without-β

Step 1 Define a grid of possible α values, {α k ; k = 1, · · · , K}, two grids of possible a 1 , b 1 values. Fix α = α k , then calculate the optimal combination of a 1 , b 1 of the GA-NoVaS-without-β method. Steps 2-5 Same as Algorithm 1, but {W n+1,m , · · · , W n+h,m } M m=1 are plugged into the analytic form of Eq. (3.19) and the standard normal distribution does not need to be truncated.

In simulation studies, for controlling the dependence of prediction performance on the length of the dataset, 16 datasets (2 from each settings) are generated from different GARCH(1,1)-type models separately and the size of each dataset is 250 (short data mimics 1-year of econometric data) or 500 (large data mimics 2-years of econometric data).

Model 1: Time-varying GARCH(1,1) with Gaussian errors

3) 2 +0.5; β 1,t = 0.2sin(0.5πg t )+ 0.2, n = 250 or 500 Model 2: Another time-varying GARCH(1,1) with Gaussian errors

Model 6: Exponential GARCH(1,1) with Gaussian errors X t = σ t ǫ t , log σ 2 t = 0.00001 + 0.8895 log

Model description: Models 1 and 2 present a time-varying GARCH model where coefficients a 0 , a 1 , b 1 change over time slowly. They differ significantly in the intercept term of σ 2 t as we intentionally keep it low in the second setting. Models 3 and 4 are from a standard GARCH where in Model 4 we wanted to explore a scenario that α 1 +β 1 is very close to 1 and thus mimic what would happen for the iGARCH situation. Model 5 allows for the error distribution to come from a student-t distribution instead of the Gaussian distribution. Note that, for a fair competition, we chose Models 2 to 5 same as simulation settings of (Chen and Politis 2019). Models 6, 7 and 8 present different types of GARCH models. These settings allow us to check robustness of our methods against model misspecification. In a real world, it is hard to convincingly say if the data obeys one particular type of GARCH model, so we want to pursue this exercise to see if our methods are satisfactory no matter what the underlying distribution and the GARCH-type model are. This approach to test the performance of a method under model misspecification is quite standard, see Olubusoye et al. (2016) used data generated from a specifically true model to estimate other GARCH-type models and test the forecasting performance, and Bellini and Bottolo (2008) investigated the impact of misspecification of innovations in fitting GARCH models.

Window size: Using these datasets, we perform 1-step, 5-steps and 30-steps ahead time-aggregated POOS predictions. For measuring different methods' prediction performance on larger datasets (i.e., data size is 500), we use 250 data as a window to do predictions and roll this window through the whole dataset. For evaluating different methods' performance on smaller datasets (i.e., data size is 250), we use 100 data as a window.

Note that log-returns can be calculated from equation shown below:

Y t = 100 × log(X t+1 /X t ) ; for t = 1, · · · , 499 or t = 1, · · · , 249. (4.1)

Where, {X t } 250 t=1 and {X t } 500 t=1 are 1-year and 2-years price series, respectively. Next, we can define time-aggregated predictions of squared log-returns as:

i+m , i = 250, · · · , 494 or i = 100, · · · , 244 Y 2 j,30 = 1 30 30 m=1Ŷ 2 j+m , j = 250, · · · , 469 or j = 100, · · · , 219 (4.2)

In Eq. (4.2),Ŷ 2 k+1 ,Ŷ 2 i+m ,Ŷ 2 j+m are single point predictions of realized squared logreturns by NoVaS-type methods or benchmark method;Ȳ 2 k,1 ,Ȳ 2 i,5 andȲ 2 j,30 represent 1-step, 5-steps and 30-steps ahead aggregated predictions, respectively. More specifically, for exploring the performance of three different prediction lengths with large data size, we roll the 250 data points window through the whole dataset, i.e., use {Y 1 , · · · , Y 250 } to predict Y 2 251 , {Y 2 251 , · · · , Y 2 255 } and {Y 2 251 , · · · , Y 2 280 }; then use {Y 2 , · · · , Y 251 } to predict Y 2 252 , {Y 2 252 , · · · , Y 2 256 } and {Y 2 252 , · · · , Y 2 281 }, for 1-step, 5steps and 30-steps aggregated predictions respectively, and so on. For exploring the performance of three different prediction lengths with small data size, we roll the 100 data points window through the whole dataset, i.e., use {Y 1 , · · · , Y 100 } to predict Y 2 101 , {Y 2 101 , · · · , Y 2 105 } and {Y 2 101 , · · · , Y 2 130 }; then use {Y 2 , · · · , Y 101 } to predict Y 2 102 , {Y 2 102 , · · · , Y 2 106 } and {Y 2 102 , · · · , Y 2 131 }, for 1-step, 5-steps and 30-steps aggregated predictions respectively, and so on. For example, with window size being 30, we perform time-aggregated predictions on a large dataset 220 times. Taking this strategy, we can exhaust the information contained in the dataset and investigate the forecasting performance continuously.

To measure different methods' forecasting performance, we compare predictions with realized values based on Eq. (4.3).

In Eq. (4.3), setting l = k, i, j means we consider 1-step, 5-steps and 30-steps ahead time-aggregated predictions respectively;Ȳ 2 l,h is the h-step (h ∈ {1, 5, 30}) ahead time-aggregated volatility prediction, defined in Eq. (4.2); h m=1 (Y 2 l+m /h) is the corresponding true aggregated value calculated from realized squared log-returns. For comparing various Model-free methods with the traditional method, we set the benchmark method as fitting one GARCH(1,1) model directly (GARCH-direct).

Different variants of methods: Note that we can perform GE-NoVaS-type and GA-NoVaS-type methods to predict Y n+h by generating {W n+1,m , · · · , W n+h,m } M m=1 from a standard normal distribution or the empirical distribution of {W t } series, then we can calculate the optimal predictor based on L 1 or L 2 risk criterion. It means each NoVaS-type method possesses four variants.

When we perform POOS forecasting, we do not know which α is optimal. Thus, we perform every NoVaS variants using α from eight potential values {0.1, 0.2, · · · , 0.8} and then pick the optimal result. For simplifying the presentation, we further select the final prediction from optimal results of four variants of a NoVaS method and use this result to be the best prediction to which each NoVaS method can reach. Applying this procedure means we take a computationally heavy approach to compare different methods' potentially best performance. However, it also means we want to challenge newly proposed methods at a maximum level, so as to see if they can beat even the best-performing scenario of the current GE-NoVaS method.

In this subsection, we compare the performance of our new methods (GA-NoVaS and GA-NoVaS-without-β) with GARCH-direct and existing GE-NoVaS methods on forecasting 250 and 500 simulated data. Results are tabulated in Table 4 .1.

From Table 4 .1, we clearly find NoVaS-type methods outperform the GARCHdirect method. Especially for using the 500 Model-1 data to do 30-steps ahead aggregated prediction, the performance of the GARCH-direct method is terrible. NoVaS-type methods are almost 30 times better than the GARCH-direct method. This means that the normal prediction method may be spoiled by error accumulation problem when long-term predictions are required. On the other hand, Model-free methods can avoid this problem.

In addition to the overall advantage of NoVaS-type methods over GARCHdirect method, we find the GA-NoVaS method is generally better than the GE-NoVaS method for both short and large data. This conclusion is two-fold: (1) The time of the GA-NoVaS being the best method is more than the GE-NovaS method;

(2) Since we want to compare the forecasting ability of GE-NoVaS and GA-NoVaS methods, we use * symbol to represent cases where the GA-NoVaS method works at least 10% better than the GE-NoVaS method or inversely the GE-NoVaS method is 10% better. We can find there is no case to support that the GE-NoVaS works better than GA-NoVaS with as least 10% improvement. On the other hand, the GA-NoVaS method achieves significant improvement when long-term predictions are required. Moreover, the GA-NoVaS-without-β dominates other two NoVaStype methods.

Since the main crux of Model-free methods is how such non-parametric methods are robust to underlying data-generation processes, here we explore other GARCHtype data generations. The GA-NoVaS method is based upon GARCH model, so it is interesting to explore whether even these methods can sustain a different type of true underlying generation and can in general outperform existing methods. Results for Models 6 to 8 are tabulated in Table 4 .1.

In general, NoVaS-type methods still outperform the GARCH-direct method for these cases. Although the forecasting ability of GE-NoVaS and GA-NoVaS for large data is indistinguishable, the GA-NoVaS is obviously better for taking short data size. For example, the GA-NoVaS method brings around 20% improvement compared with the GE-NoVaS method for 30-steps ahead aggregated prediction of 250 Model-6 simulated data. Doing better prediction with past data that is shorter in size is always a significant challenge and thus it is valuable to discover the GA-NoVaS method has superior performance for this scenario. Not surprisingly, the GA-NoVaS-without-β method still keeps great performance.

Through deploying simulation data analysis, we find GA-NoVaS-type methods can sustain great performance against short data and model misspecification. Overall, our new methods outperform the GE-NoVaS method and can render notable improvement for some cases when long-term predictions are desired. Note: Column names "GA" and "GE" represent GE-NoVaS and GA-NoVaS methods, respectively; "GARCH" means GARCH-direct method; "P-GA" means GA-NoVaS-without-β method. The benchmark is the GARCH-direct method, so numerical values in the table corresponding to GARCH-direct method are 1. Other numerical values are relative values compared to the GARCH-direct method. "M i-j"steps denotes using data generated from the Model i to do j steps ahead time-aggregated predictions. The bold value means that the corresponding method is the optimal choice for this data case. Cell with * means the GA-NoVaS method is at least 10% better than the GE-NoVaS method or inversely the GE-NoVaS method is at least 10% better.

From Section 4, we have found that NoVaS-type methods have great performance on dealing with different simulated datasets. However, no methodological proposal is complete unless one verifies it on several real-world datasets. This section is devoted to explore, in the context of real datasets forecasting, whether NoVaStype methods can provide good long-term time-aggregated forecasting ability and how our new methods are compared to the existing Model-free method.

For performing an extensive analysis and subsequently acquiring a convincing conclusion, we use three types of data-stock, index and currency data-to do predictions. Moreover, as done in simulation studies, we apply this exercise on two different lengths of data. For building large datasets (2-years period data), we take new data which come from Jan.2018 to Dec.2019 and old data which come from around 20 years ago, separately. The dynamics of these econometric datasets have changed a lot in the past 20 years and thus we wanted to explore whether our methods are good enough for both old and new data. Subsequently, we challenge our methods using short (1-year period) real-life data. Finally, we also do forecasting using volatile data, i.e., data from Nov. 2019 to Oct. 2020. Note that economies across the world went through a recession due to the COVID-19 pandemic and then slowly recovered during this time-period, typically these sort of situations introduce systematic perturbation in the dynamics of econometric datasets. We wanted to see if our methods can sustain such perturbations or abrupt changes.

For mimicking the 2-years period data, we adopt several stock datasets with 500 data size to do forecasting. In summary, we still compare different methods' performance on 1-step, 5-steps and 30-steps ahead POOS time-aggregated predictions. Performing the similar procedure as which we did in Section 4, all results are shown in Table 5 .1. We can clearly find NoVaS-type methods still outperform the GARCH-direct method. Additionally, although the GE-NoVaS method is indistinguishable with the GA-NoVaS method, our new method is more robust than the GE-NoVaS method, see the 30-steps ahead prediction of old two-years BAC and MSFT cases. We can also notice that the GA-NoVaS-without-β method is more robust than other two NoVaS methods. The β-removing idea proposed by Wu and Karmakar (2021) is substantiated again.

Since the main goal of this article is offering a new type of NoVaS method which has better performance than the GE-NoVaS method for dealing with short and volatile data, we provide more extensive data analysis to support our new methods in next sections. Note: Column names "GA" and "GE" represent GE-NoVaS and GA-NoVaS methods, respectively; "GARCH" means GARCH-direct method; "P-GA" means GA-NoVaS-without-β method. The benchmark is the GARCH-direct method, so numerical values in the table corresponding to GARCH-direct method are 1. Other numerical values are relative values compared to the GARCH-direct method. The bold value means that the corresponding method is the optimal choice for this data case. Cell with * means the GA-NoVaS method is at least 10% better than the GE-NoVaS method or inversely the GE-NoVaS method is at least 10% better.

For challenging our new methods in contrast to other methods for small real-life datasets, we separate every new 2-years period data in Section 5.1 to two 1-year period datasets, i.e., separate four new stock datasets to eight samples. We believe evaluating the prediction performance using shorter data is a more important problem and thus we wanted to make our analysis very comprehensive. Therefore, for this exercise, we add 7 index datasets: Nasdaq, NYSE, Small Cap, Dow Jones, S&P 500 , BSE and BIST; and two stock datasets: Tesla and Bitcoin into our analysis.

From Table 5 .2 which presents prediction results of different methods on 2018 and 2019 stock data, we still observe that NoVaS-type methods outperform GARCHdirect method for almost all cases. Among different NoVaS methods, it is clear that our new methods are superior than the existing GE-NoVaS method. For 30-steps ahead predictions of 2018-BAC data, 2019-MCD and Tesla data, etc, the existing NoVaS method is even worse than the GARCH-direct method. On the other hand, the GA-NoVaS method is more stable than the GE-NoVaS method, e.g., 30% improvement is created for the 30-steps ahead prediction of 2018-BAC data. After applying the β-removing idea, the GA-NoVaS-without-β significantly beats other methods for almost all cases.

From Table 5 .3 which presents prediction results of different methods on 2018 and 2019 index data, we can get the exactly same conclusion as before. NoVaS-type methods are far superior than the GARCH-direct and our new NoVaS methods outperform the existing GE-NoVaS method. Interestingly, the GE-NoVaS method is again beaten by the GARCH-direct method in some cases, such as 2019-Nasdaq, Smallcap and BIST. On the other hand, new methods still show more stable performance. Compared to the existing GE-NoVaS method, the GA-NoVaS-without-β method creates around 60% improvement from the GE-NoVaS method on the 30steps ahead prediction of 2019-BIST data. In addition, the GA-NoVaS method shows more than 10% improvement for all 2018-BSE cases.

Combining results presented in Tables 5.1 to 5.3, our new methods present better performance than existing GE-NoVaS and GARCH-direct methods on dealing with small and large real-life data. The improvement generated by new methods using shorter sample size (1-year data) is more significant than using larger sample size (2-years data). Note: Column names "GA" and "GE" represent GE-NoVaS and GA-NoVaS methods, respectively; "GARCH" means GARCH-direct method; "P-GA" means GA-NoVaS-without-β method. The benchmark is the GARCH-direct method, so numerical values in the table corresponding to GARCH-direct method are 1. Other numerical values are relative values compared to the GARCH-direct method. The bold value means that the corresponding method is the optimal choice for this data case. Cell with * means the GA-NoVaS method is at least 10% better than the GE-NoVaS method or inversely the GE-NoVaS method is at least 10% better. Note: Column names "GA" and "GE" represent GE-NoVaS and GA-NoVaS methods, respectively; Column name "GARCH" means GARCH-direct method; "P-GA" means GA-NoVaS-without-β method. The benchmark is the GARCH-direct method, so numerical values in the table corresponding to GARCH-direct method are 1. Other numerical values are relative values compared to the GARCH-direct method. The bold value means that the corresponding method is the optimal choice for this data case. Cell with * means the GA-NoVaS method is at least 10% better than the GE-NoVaS method or inversely the GE-NoVaS method is at least 10% better.

In this subsection, we perform POOS forecasting using volatile 1-year data (i.e., data from Nov. 2019 to Oct. 2020). We tactically choose this period data to challenge our new methods for checking whether it can self-adapt to the structural incoherence between pre-and post-pandemic, and we also want to compare our new methods with the existing GE-NoVaS method. For observing affects of pandemic, we can take the price of SP500 index as an example. From Fig. 5 .1, it is clearly that the price grew slowly during the normal period form Jan. 2017 to Dec. 2017. However, during the most recent one year, the price fluctuated severely due to the pandemic. Similarly, we focus on evaluating the performance of NoVaS-type methods on handling volatile data by doing comparisons with the GARCH-direct method. For executing a comprehensive analysis, we again investigate different methods' performance on stock, index and currency data.

The POOS forecasting results of volatile 1-year stock datasets are presented in Table 5 .4. NoVaS-type methods dominate the GARCH-direct method. The performance of the GARCH-direct method is terrible especially for the Bitcoin case. Apart from this overall advantage of NoVaS-type methods, there is no doubt that the GA-NoVaS method manifests greater prediction results than the GE-NoVaS method since it occupies 13 out 27 optimal choices and stands at least 10% improvement for 5 cases. The parsimonious GA-NoVaS-without-β also shows better results than the GE-NoVaS method. This phenomenon lends strong evidence to support our postulation that the GA-NoVaS method is more appropriate to handle volatile data.

The POOS forecasting results of most recent 1-year currency datasets are presented in Table 5 .5. One thing should be noticed is that Fryzlewicz et al. (2008) implied the ARCH framework seems to be a superior methodology for dealing with the currency exchange data. Therefore, we should not anticipate that GA-NoVaS-type methods can attain much improvement for this data case. However, the GA-NoVaS method still brings off around 26% and 37% improvement for 30-steps ahead predictions of CADJPY and CNYJPY, respectively. Besides, the GA-NoVaS-without-β method also remains great performance. This surprising result can be seen as an evidence to show GA-NoVaS-type methods are robust to model misspecification.

The POOS forecasting results of most recent 1-year index datasets are presented in Table 5 .6. Consistent with conclusions corresponding to previous two classes of data, NoVaS-type methods still have obviously better performance than the GARCH-direct method. Besides this advantage of NoVaS methods, new methods still govern the existing GE-NoVaS method. In addition to these expected results, we find the GE-NoVaS method is even 14% worse than the GARCH-direct method for 1-step USDX future case. On the other hand, GA-NoVaS-type methods still keep great performance. This phenomenon also appears in Sections 4.1.1, 4.1.2, 4.2, 5.1 and 5.2. Beyond this, there are 12 cases that the GA-NoVaS method renders more than 10% improvement compared to the GE-NoVaS method. A most The bold value means that the corresponding method is the optimal choice for this data case. Cell with * means the GA-NoVaS method is at least 10% better than the GE-NoVaS method or inversely the GE-NoVaS method is at least 10% better. 

After performing extensive real-world data analysis, we can conclude that NoVaStype methods have generally better performance than the GARCH-direct method. Sometimes, the long-term prediction of GARCH-direct method is impaired due to accumulated errors. Applying NoVaS-type methods can avoid such issue. In addition to this encouraging result, two new NoVaS methods proposed in this article all have greater performance than the existing GE-NoVaS method, especially for analyzing short and volatile data. The satisfactory performance of NoVaS-type methods on predicting Bitcoin data may also open up the application of using NoVaS-type methods to forecast cryptocurrency data.

As illustrated in Section 1, accurate and robust volatility forecasting is an important focus for econometricians. Typically, volatility of returns can be characterized by GARCH-type models. Then, with the Model-free Prediction Principle being proposed, a more accurate NoVaS method was built to predict volatility. This paper further improves the existing NoVaS method by proposing a new transformation structure in Section 3. After performing extensive POOS predictions on different classes of data, we find our new methods achieve better prediction performance than traditional GARCH(1,1) model and the existing GE-NoVaS method. The most successful method is the GA-NoVaS-without-β method. However, one may still think the victory of our new methods is just caused by using specific sample even new methods show lower prediction error (i.e., calculated by Eq. (4.3)) for almost all cases. Therefore, we want to learn whether this victory is statistically significant. We shall notice that Wu and Karmakar (2021) applied CW-tests to show removing-β idea is appropriate to refine the GE-NoVaS method. Likewise, we are curious about if this refinement is again reasonable for deriving the GA-NoVaS-without-β method from the GA-NoVaS method. In this paper, we focus on the CW-test built by Clark and West (2007) 10 which applied an adjusted Mean Squared Prediction Error (MSPE) statistics to test if parsimonious null model and larger model have equal predictive accuracy, see Dangl and Halling (2012) ; Kong et al. (2011) ; Dai and Chang (2021) for examples of applying this CW-test.

Note that the GA-NoVaS-without-β method is a parsimonious method compared with the GA-NoVaS method. The reason of removing the β term has been illustrated in Section 2.4. Here, we want to deploy the CW-test to make sure the β-removing idea is not only empirically adoptable but also statistically reasonable. We take several results from Section 5 to run CW-tests. However, it is tricky to apply the CW-test on comparing 5-steps and 30-steps aggregated predictions. In other words, the CW-test result for aggregated predictions is ambiguous. It is hard to explain the meaning of a significant small p-value. Does this mean a method outperforms the opposite one for all single-step horizons? Or does this mean the method just achieves better performance at some specific future steps? Therefore, we just consider 1-step ahead prediction horizon and CW-test results are tabulated in Table 6 .1.

From Table 6 .1, under a one-sided 5% significance level, there is only 1 case out of total 28 cases which rejects the null hypothesis. Besides, we should notice that the CW-test still accepts the null hypothesis for 2018-MSFT and volatile period of MCD even the GA-NoVaS method has a better performance value on these cases. Moreover, the GA-NoVaS-without-β is more computationally efficient than the GA-NoVaS method. In summary, the reasonability of removing β term is shown again by comparing GA-NoVaS and GA-NoVaS-without-β methods. 

In this paper, we show the current state-of-the-art GE-NoVaS and our proposed new methods can avoid error accumulation problem even when long-step ahead predictions are required. These methods outperform GARCH(1,1) model on predicting either simulated data or real-world data under different forecasting horizons. Moreover, the newly proposed GA-NoVaS method is a more stable structure to handle volatile and short data than the GE-NoVaS method. It can also bring significant improvement when the long-term prediction is desired. Additionally, although we reveal that parsimonious variants of GA-NoVaS and GE-NoVaS indeed possess a same structure, the GA-NoVaS-without-β method is still more favorable since the corresponding region of model parameter is more complete by design. In summary, the approach to build the NoVaS transformation through the GARCH(1,1) model is sensible and results in superior GA-NoVaS-type methods.

In the future, we plan to explore the NoVaS method in different directions. Our new methods corroborate that and also open up avenues where one can explore other specific transformation structures. In the financial market, the stock data move together. So it would be exciting to see if one can do Model-free predictions in a multiple time series scenario. In some areas, integer-valued time series has important applications. Thus, adjusting such Model-free predictions to deal with count data is also desired. There are also a lot of scopes in proving statistical validity of such predictions. First, we hope a rigorous and systematic way to compare predictive accuracy of NoVaS-type and standard GARCH method can be built. From a statistical inference point of view, one can also construct prediction intervals for these predictions using bootstrap. Such prediction intervals are well sought in the econometrics literature and some results on asymptotic validity of these can be proved. We can also explore dividing the dataset into test and training in some optimal way and see if that can improve performance of these methods. Additionally, since determining the transformation function involves optimization of unknown coefficients, designing a more efficient and precise algorithm may be a further direction to improve NoVaS-type methods.

The first author is thankful to Professor Politis for introduction to the topic and useful discussions. The second author's research is partially supported by NSF-DMS 2124222.

We have collected all data presented here from www.investing.com manually. Then, we transform the closing price data to financial log-returns based on Eq. (4.1). Note: P-GE and P-GA columns represent GE-NoVaS-without-β and GA-NoVaS-without-β methods' relative forecasting performance compared to the GARCH-direct method, respectively. The Relative value column measures how much the GA-NoVaS-without-β method is better than the GE-NoVaS-without-β method or how much the GE-NoVaS-without-β method is better than the GA-NoVaS-without-β method, i.e., it is calculated by (max(P-GE,P-GA) − min(P-GE,P-GA))/max(P-GE,P-GA). The bold values mark cases where one of these two methods is at least 10% better than the another one based on the relative prediction performance. 

Coronavirus pandemic and its implication on global economy

Answering the skeptics: Yes, standard volatility models do provide accurate forecasts

The directional accuracy of 15-months-ahead forecasts made by the imf

Predicting the volatility of the s&p-500 stock index via garch models: the role of asymmetries

Risks for the long run: Estimation with time aggregation

Misspecification and domain issues in fitting garch (1, 1) models: a monte carlo investigation

Modelling the volatility of the dow jones islamic market world index using a fractionally integrated time-varying garch (fitvgarch) model

Generalized autoregressive conditional heteroskedasticity

A simple model for now-casting volatility series

Predicting stock volatility using after-hours information: evidence from the nasdaq actively traded stocks

Prediction in time series models and model-free inference with a specialization in financial return data

Optimal multi-step-ahead prediction of arch/garch models and novas transformation

Time-varying novas versus garch: Point prediction, volatility estimation and prediction intervals

Forecasting exchange rate volatility using highfrequency data: Is the euro different?

Long-term prediction intervals of economic time series

Approximately normal tests for equal predictive accuracy in nested models

Predicting stock return with economic constraint: Can interquartile range truncate the outliers?

Predictive regressions with time-varying coefficients

Does past volatility affect investors' price forecasts and confidence judgements?

What good is a volatility model? Quantitative Finance 1

Autoregressive conditional heteroscedasticity with estimates of the variance of united kingdom inflation

The importance of global economic policy uncertainty in predicting gold futures market volatility: A garch-midas approach

Economic effects of coronavirus outbreak (covid-19) on the world economy

Low and high prices can improve volatility forecasts during periods of turmoil

Normalized least-squares estimation in timevarying arch models

Forecasting volatility: A reality check based on option pricing, utility function, value-at-risk, and predictive likelihood

Comparison of forecasting performances: Does normalization and variance stabilization method beat garch (1, 1)-type models? empirical evidence from the stock markets

Forecasting crude oil price volatility

Bayesian modelling of time-varying conditional heteroscedasticity

Long-term prediction intervals with many covariates

Time series analysis for financial market meltdowns

The economics of options-implied inflation probability density functions

Predicting market components out of sample: asset allocation implications

Comparing the performances of garch-type models in capturing the stock market volatility in malaysia

Uncertainty in long-term macroeconomic forecasts: Ex post evaluation of forecasts by economics researchers

Misspecification of variants of autoregressive garch models and effect on in-sample forecasting

Estimating and forecasting volatility of stock indices using asymmetric garch models and (skewed) student-t densities

A normalizing and variance-stabilizing transformation for financial time series

Model-free versus model-based volatility prediction

In: Model-Free Prediction and Regression

Volatility forecasting with smooth transition exponential smoothing

Garch forecasting performance under different distribution assumptions

Model-free time-aggregated predictions for econometric datasets

Social distancing and productivity: how to manage a volatile period of growth for the uk economy

Empirical analysis of stock return distributions impact upon market volatility: Experiences from australia

We asserted that the GA-NoVaS-without-β method works better than the GE-NoVaS-without-β in Section 3.3. Although these two parsimonious variants of GE-NoVaS and GA-NoVaS have a same structure, we showed that regions of their parameters are different. The GA-NoVaS-without-β method has a wider parameter space, and this property implies that it is a more complete technique. For substantiating this idea, we compare the forecasting performance of these two parsimonious methods and present results in Table A .1. We can find most of cases are accompanied by very small relative values, which indicates that these two methods stand almost same performance and is in harmony with the fact that they share a same structure. However, we can find there are 21 cases where the GA-NoVaS-without-β method works at least 10% better than the GE-NoVaS-without-β method. On the other hand, there is only 8 cases where the GE-NoVaS-without-β method shows significantly better results. We shall notice that the GA-NoVaS-without-β method is optimized by determining parameters from several grids of values. Therefore, we can imagine the performance of this method will further increase if more refined grids are used. In other words, we can anticipate that the GA-NoVaS-without-β method will dominate the GE-NoVaS-without-β with subtle grids.