key: cord-0164822-pjpgv6g3
authors: Garcin, Matthieu; Klein, Jules; Laaribi, Sana
title: Estimation of time-varying kernel densities and chronology of the impact of COVID-19 on financial markets
date: 2020-07-17
journal: nan
DOI: nan
sha: a1c4bc89e7f31b985a477b68b246bf77c2f896b2
doc_id: 164822
cord_uid: pjpgv6g3

The time-varying kernel density estimation relies on two free parameters: the bandwidth and the discount factor. We propose to select these parameters so as to minimize a criterion consistent with the traditional requirements of the validation of a probability density forecast. These requirements are both the uniformity and the independence of the so-called probability integral transforms, which are the forecast time-varying cumulated distributions applied to the observations. We thus build a new numerical criterion incorporating both the uniformity and independence properties by the mean of an adapted Kolmogorov-Smirnov statistic. We apply this method to financial markets during the COVID-19 crisis. We determine the time-varying density of daily price returns of several stock indices and, using various divergence statistics, we are able to describe the chronology of the crisis as well as regional disparities. For instance, we observe a more limited impact of COVID-19 on financial markets in China, a strong impact in the US, and a slow recovery in Europe.

The knowledge of the distribution of price returns is overriding in finance. Indeed, forecasts and risk measures, such as the Value-at-Risk (VaR), the expected shortfall, or even the volatility, can be seen as scalars calculated from a probability density function (pdf). Practitioners appreciate these scalars for their simplicity but the pdf contains relevant and more comprehensive information. An accurate description of the pdf is thus worthwhile in a financial perspective.

For this reason, pdf in finance should not be limited to the popular Gaussian distribution. More realistic parametric distributions have thus been put forward [30] , like the NIG distribution [13] or the alpha-stable one [41] , among many others. Besides, the non-parametric alternative makes it possible to depict more accurately the real pdf but it may be subject to overfitting if it does not include any regularization. For this purpose, Beran has proposed the minimum Hellinger distance approach, in which a non-parametric pdf has to be estimated first, before being approximated by a parametric distribution [4] . This approach finds some applications in finance [28] . Other semiparametric approaches include the distortion of a parametric density in order to take into account higher-order empirical moments, using for example an Edgeworth expansion [25, 16] , which also has some applications in finance [33] . Finally, non-parametric approaches, like the kernel density, include a smoothing parameter, called bandwidth, which is supposed to balance accuracy and statistical robustness [45, 42] . Selecting an appropriate bandwidth is a hard task often left aside by practitioners in finance, but some statistical methods propose criteria that the bandwidth should minimize, like the asymptotic mean integrated square error (AMISE) [21] .

In finance, non-parametric methods are not limited to the estimation of a pdf. We can cite for example their use to estimate the impact of market events, as with the non-parametric news impact curve in econometric volatility models [34, 12, 7, 15] . The rationale of such an approach is that linear impact models misestimate the reaction of markets to extreme events. Similarly, parametric models do often not describe accurately enough the tails of the pdf of price returns. Extreme events may also lead to other methodological choices in addition to the non-parametric approach. Indeed, like all the previous financial crisis, the recent crisis provoked by the COVID-19 pandemic has highlighted that the occurrence of an extreme event may have a sustainable effect on the market, namely on the pdf of price returns. Bad economic news do not only result in one extreme daily price variation but it can initiate a longer turmoil period. This alternation of market regimes incites to introduce time-varying densities. Once again, several approaches are possible, depending on the fact that we consider a parametric pdf [1] or a non-parametric one [19] . We are particularly interested in the non-parametric approach, which offers the possibility to reach a higher accuracy. We stress the fact that applications of time-varying kernels in finance are not limited to estimating a pdf of price returns. They indeed include interest rate models [48] or the study of the dynamic pdf of correlation coefficients with a particular shape during tumble periods [26] .

Time-varying kernel densities rely on the choice of two important free parameters: the bandwidth smooths the pdf at a given date, exactly as in the static approach, whereas the discount factor smooths the variations in time. Harvey and Oryshchenko propose a maximum likelihood approach to select these free parameters [19] . However, the literature about density validation requires stronger properties, namely the fact that the cumulative distribution function (cdf) of price returns must form a set of iid uniform variables, known as the probability integral transforms (PITs) [10] . In the present paper, we thus propose a new selection rule of the bandwidth and of the discount parameter, so that it is consistent with the validation rule of the pdf. The main challenge is then to build the criterion, that is to define a function of the bandwidth and of the discount factor that we intend to minimize. Indeed, the traditional approach for validating an estimated pdf consists in a series of statistical tests and graphical analysis, not in a sole numerical criterion. The criterion we propose relies on a Kolmogorov-Smirnov statistic, that we can replace by any statistic of distribution divergence. We also adapt this statistic so as to minimize the discrepancy of the series of the PITs for which we need the independence. Our work is not the first one to stress the limitations of the maximum likelihood approach in the selection of the two free parameters. We can indeed cite an article following a least-square approach [39] and another one maximizing a uniformity criterion for the PITs by the mean of an artificial neural network [46] . Our method differs from the latter as it also takes into account the independence of the PITs and it requires only standard statistical tools compared to artificial neural networks.

We apply our new method to several stock indices before and during the financial crisis induced by the onset of the COVID-19 in the US, in Europe, and in Asia. The question of the impact of the pandemic on stock markets is a hot topic. Several papers deal with this subject and stress the exceptional amplitude of the crisis [2, 3, 36] . We propose here a new outlook on this financial crisis, using the time-varying kernel densities to describe its chronology. We also study the significance of the daily kernel density with respect to the pdf in a steady market. This makes it possible to determine the interval of dates for which the distribution of price returns significantly indicates a financial crisis. In particular, we observe that the speed at which markets recover varies a lot among the regions considered.

The paper is organized as follows. In Section 2, we introduce the method for estimating a dynamic kernel density along with the selection rule for the bandwidth and the discount factor. In Section 3, we compare this selection method to another one based on a likelihood criterion, with the help of simulations. In Section 4, we apply the introduced method to stock markets during the COVID-19 crisis. Section 5 concludes.

In this section, we introduce a method to estimate a time-varying density. For this purpose, we recall how we can estimate a static non-parametric density as well as its dynamic adaptation. This method relies on the choice of two free parameters. The main innovation of this paper consists in basing this choice on a quantitative version of criteria usually devoted to the evaluation of forecast densities. The last subsection is about some divergence statistics between two densities. We will use these divergences in the empirical part of this paper to quantify the amplitude of the variations of the densities through time and to determine the significance of these variations.

A widespread non-parametric method to estimate a pdf uses kernels. The kernel density, estimated on observations X 1 , ..., X t , is defined, for x ∈ R, by:

where h > 0 is the bandwidth and K a function following the same rules as a pdf, namely it is positive, integrable and its integral is one [45, 42] . With these two properties,f also has the features of a density. In particular, when integratingf , the substitution y = (x − X i )/h in each of the t integrals in the sum clearly shows that we need to normalize the sum by th in order to have

Rf (x)dx = 1. The symmetry and the continuity of the kernel is also often desirable. The rationale of the kernel density is to make a continuous generalization of a histogram. Indeed, in the histogram, we count the number of occurrences in given intervals. The thinner the intervals, the more accurate the density estimation. But very thin intervals lead to overfitting, with a very erratic estimated density. To avoid this, we prefer to smooth the histogram. A simple manner to do this consists in replacing the number of occurrences of observations in each thin interval by a criterion of proximity of each observation to the middle of this interval. This is how the kernel density works. The proximity function K must thus reach its maximum in zero and it must decrease progressively when its argument gets away from zero. Thus, the impact of X t on the estimated pdf in x is maximal for x = X t and it decreases progressively when |x − X t | becomes higher until reaching zero, at least asymptotically. It means that the observation of X t will have no impact on the densityf (x) if X t is by far greater or lower than x.

There exists a large literature on the choice of the kernel K [38, 6] . Epanechnikov and Gaussian kernels are widespread, due to their simplicity. 1 But it seems, according to the related literature, that the choice of the kernel is often less overriding than the choice of the bandwidth h. Indeed, this parameter plays the role of a regularization parameter. In practice, we tune h in order to balance accuracy and robustness. The larger h, the wider each kernel and the larger the interval on which each observation has an impact. We review in Section 2.4 some methods to select this bandwidth.

We can change the formulation of the kernel density in order to take into account its progressive evolution through time. We get this dynamic version of the kernel density by the mean of weights w t,i :f

such that t i=1 w t,i = 1 [19] . For a fixed t, if the weights w t,i increase with i, more recent observations will be overweighted and the update of the kernel density is consistent with the economic intuition. The exponential weighting is widespread in the statistical literature as it can reduce the computation of a density update to a simple recursive formula instead of the linear cost induced by the whole estimation of the density from scratch. We then express the weights by

where 0 < ω ≤ 1. With this setting for the weights, we notef h,ω t the density introduced in equation (2). Then, when the duration of the estimation sample is large enough with respect to the speed of decay of the weights, a good approximation of w t,i is (1 − ω)ω t−i . The recursive formula that the dynamic kernel density follows is then:

The two free parameters of this dynamic non-parametric density are the bandwidth h, and the discount factor ω. Starting at a given time t 0 from a density with an exponential weighting, such as in equation (2), we obtain the density at subsequent times iteratively by applying equation (3).

Along with the time-varying density, we can build the corresponding cdf. Integrating equation (2), the first estimated cdf, at time t 0 , is:

where K is the primitive of K such that lim x→∞ K(x) = 1. Subsequently, we get the cdf at a time t + 1 by the mean of the following iteration:

which is the primitive of equation (3). 1 In the empirical application, we use Epanechnikov kernel.

Other approaches are possible for estimating a time-varying density. For example, we could have estimated static densities on successive intervals and have then smoothed the transition between the resulting densities. For parametric densities, this amounts to smoothing time-varying parameters, which is a well-known subject in statistics [14] . However, this approach is not very natural for non-parametric densities, as we need a big amount of data to estimate one static density.

The purpose of density forecast may vary a lot. In a financial perspective, one may need it to build risk measures, or to forecast an average price return or a most likely price return. In practice, an investment decision is to be made relying on this density. One must then evaluate the quality of the forecast with respect to a loss function corresponding to this decision. Unfortunately, there cannot exist any absolute ranking of density forecasts valid for all the possible loss functions [10] .

We therefore have to make a choice which is necessarily subject to discussion. In the econometric literature, we can find an evaluation of the density forecast by the mean of the likelihood of observations [19] . We think that this choice, as focusing on the body of the distribution, neglects the behaviour of the density in its tails. A more general perspective would incite to choose a density forecast consistent with the real density, even with its tails. Such a forecast would be more relevant in finance for calculating a VaR or an expected shortfall. However, the real density is never observed. If we had a static density forecast, the evaluation of this forecast could simply consist in comparing it with the empirical density of all the observed price returns. But the forecast density is supposed to change at each time, so that we have to base our analysis on another invariant density. This is the purpose of the analysis of the PITs, introduced by Diebold, Gunther, and Tay [10] and widespread in the literature of evaluation of density forecasts [17, 19, 23] . We now expose this method, that we will adapt in the next subsection from the evaluation of density forecasts to the selection of the bandwidth and of the discount factor.

We observe T successive price returns: X 1 , ..., X T . We use the t 0 first to build a density estimation f h,ω t0 , using equation (2) . This density includes a discount in order to depict more closely recent observations. We thus conceivef h,ω t0 as a forecast of the true density f t0+1 of X t0+1 , as well as we conceivef h,ω t , for any t ≥ t 0 , as a forecast of the true density f t+1 of X t+1 . Of course, f t varies with t, and we only observe one random variable in this density, namely X t . However, we are able to build a density which does not depend on t and which will therefore be very useful for evaluating the quality of the density forecast. This invariant distribution is the one of the PIT variables, which are defined by:

where F t is the true cdf, then, whatever t, Z h,ω t follows a uniform random variable in the interval [0, 1]: Z h,ω t ∼ U(0, 1) [10] . In fact, this idea is quite old [37] and is even something with which any person simulating random variables following a given cdf is familiar. In addition to being uniform, the variables Z h,ω t0+1 , ..., Z h,ω T must also be independent.

Thanks to the PITs, we have T − t 0 observations in the same uniform density. This makes it possible to evaluate the density forecast: we have to check that Z h,ω t0+1 , ..., Z h,ω T are indeed iid and uniform in [0, 1]. We expose in the next subsection the difference of approach regarding this point between the evaluation of density forecast and our framework, which is about the selection of optimal parameters.

The literature about bandwidth selection is very rich [45, 21, 42] . Beyond the rule-of-thumb which often leads the choice of the bandwidth h among practitioners, we can cite more relevant methods of selection, such as the minimization of AMISE [40] , evaluated for instance with cross validation or a plug-in technique. As exposed in the previous subsection, we will try to select h in order to make the distribution of the PITs close to a uniform distribution, and the uniform case is trivial in the AMISE approach and makes this method ineffective. We can also cite the possibility to estimate a time-varying bandwidth, like in the literature about online estimation of kernel density [47, 24] . Our approach will be different from the online framework: our time-varying aspect is not about h but about the density.

Our problem, in addition, is not only about selecting h. We have to select it jointly with the discount factor ω. As already mentioned, we can base this selection on the maximization of a likelihood [19] . But we want to have an accurate description of the true density, not to make the best point forecast. This thus incites us to use PITs and to adapt the method of evaluation of density forecasts. We have two objectives regarding the PITs: the uniformity and the independence.

We first focus on the uniformity. According to Diebold, Gunther, and Tay, methods based on statistical tests, such as the Kolmogorov-Smirnov test, are not relevant because nonconstructive, insofar as they do not indicate why PITs are not uniform [10] . They thus prefer a qualitative analysis using graphical tools such as a simple correlogram. Besides this mainstream approach, some papers propose a statistical test assessing the uniformity of the PITs [5] . Our framework is in fact different as we do not want to determine whether our density forecast is good or not. Instead, given a density model, we only want to select its best parameters, here h and ω. Maybe our forecast will be poor, even though the non-parametric approach makes this case unlikely, but we will have done the best with respect to the model used. We thus do not want to test the consistence of our PITs with a uniform distribution, but we select the parameters h and ω minimizing some test statistic. We choose to minimize the Kolmogorov-Smirnov statistic, k, because it is widespread and easy to understand. This statistic is simply the maximum gap between the empirical cdf and the theoretical one, which, in our case, is uniform:

But the Kolmogorov-Smirnov statistic says nothing about the independence of the variables and the fact that the sampling is random [11] . This property is however crucial and its absence could lead to nonsense estimations [20, 8] . In the standard approach regarding the evaluation of density forecast, the independence is assessed by graphical tools, such as a correlogram [10] . We would like again a more systematic approach. We thus use an additional criterion coming from the literature of simulation of quasi-random variables. We indeed want our series of PITs to be a low-discrepancy sequence [31, 43, 32, 18] . This is even more important that we want to estimate a time-varying density of price returns in a regime-switching market. The rationale behind the discrepancy is that the uniformity must be a feature not only of isolated PITs but also of vectors of PITs: the sequence must be equidistributed. This method will avoid almost static densities in which price returns are globally well distributed, but with mainly high PITs in a bullish regime and then mainly low PITs during a crisis period. The discrepancy criterion we propose to minimize is then the multivariate uniformity Kolmogorov-Smirnov statistic for a given size of vectors of PITs, taking into account the targeted independence of the PITs. This independence, along with the uniformity, makes that the theoretical joint distribution of the vector (Z h,ω

in the targeted case, as soon as s = t. We focus on vectors of dimension 2 because the discrepancy statistics are difficult to compute for larger dimensions [22, 27] . In particular, for a pair of observations with a given time lag τ > 0, we define:

In the definition of k τ , the case τ = 0 is not allowed. Indeed, in this case, the two PITs Z h,ω s and Z h,ω s+τ are not to be independent from each other because they are equal. However, we can define another statistic k τ , for τ ≥ 0, which includes both the independence of the PITs for τ > 0, thanks to k τ , and their univariate uniformity, thanks to k:

.., Z h,ω T ) else. Besides, the multivariate Kolmogorov-Smirnov statistic depends on the size of the sample. We thus consider a size-adapted aggregation of the (k τ ) τ ≥0 . Indeed, for a sample of n → +∞ observations, √ n × k τ has a limit distribution which does not depend on n [9] . We also choose a maximal lag ν above which we will not consider dependence effects. Indeed, if the time lag is too big the number of observations is very limited and the asymptotic Kolmogorov distribution may not apply. 2 We thus propose the following size-adapted statistic: 3

Finally, the optimal bandwidth and discount factor are defined as the parameters minimizing this uniformity and discrepancy statistic:

By doing so, we do not exactly target the independence of any vector of observations. Instead, our simplified statistic aims at obtaining the independence of pairs of observations contained in a time interval of duration ν, like with the correlogram approach suggested by Diebold, Gunther, and Tay [10] .

We also propose a constrained version of this optimisation problem. Indeed, the above unconstrained problem may lead to a dynamic of densities far from the economic intuition, for example with very rough densities. In practice, the time-varying densities we have built with this method seem empirically robust, at first sight. But we are interested in defining some reasonable bounds for h or ω. In order to have a robust time-varying density, we want that an isolated observation does not change too much the density between two consecutive dates. To state things quantitatively, we want to limit the Kolmogorov-Smirnov statistic between two densities at consecutive dates. We propose ν −1 as an upper bound. With a daily ν −1 bound, the maximal change of the Kolomogorov-Smirnov statistic, which is 1, cannot be reached before the horizon ν. The link between the ν −1 bound and the parameters is straightforward, as the update of the cdf leads to an increase of the cdf at one point of at most 1 − ω. Therefore, we introduce a new bound for ω and the constrained problem is as follows:

We could also want to set bounds to h in order to secure the robustness of the density at one date instead of the robustness across time. Nevertheless, we consider that the robustness across time is enough. Indeed, as the density will not change very rapidly, each density will be a fairly good forecast for observations close in time. 2 We may consider, for example, ν = 22, so that we verify the independence for every one-month interval of every pair of daily price returns. This arbitrary choice is applied in the empirical part of this paper. 3 An alternative criterion is proposed in Appendix A.

In the method exposed above to select h and ω, we minimize the divergence between an empirical distribution and a uniform one. In particular, we use the Kolmogorov-Smirnov statistic to depict this divergence because of both its simplicity and its asymptotic behaviour. But other divergence metrics could replace the Kolmogorov-Smirnov statistic in this method.

We can also use various divergence statistics to quantify to which extent the estimated pdf is different from what it was at a reference date and thus track the evolution of the pdf through time. This is what these various statistics are devoted to in the empirical part of this paper. In addition, thanks to simulations, we will determine confidence intervals for each of these divergences at each date, so that we will be able to assess whether the evolution of the pdf through time is significant or not. We now review three of these divergence statistics in addition to the Kolmogorov-Smirnov statistic. Some are based on a comparison of densities, and others on a comparison of cdfs or even of quantiles.

First we recall the definition of the Kolmogorov-Smirnov statistic between the cdfsF t andF t0 :

Whereas the Kolmogorov-Smirnov statistic considers the maximal difference between two cdfs, the Hellinger distance is the cumulated difference between densities:

The p-Wasserstein distance, given p ≥ 1, is related to the optimal transportation theory [44] . It is the minimal cost to reconfigure one pdf in another one. The Kolmogorov-Smirnov statistic is the L ∞ distance between the cdfs, whereas the Wasserstein metric is the L p distance between their quantiles. The p-Wasserstein distance is indeed defined by [35] :

In this paper, we focus on the case p = 1, for which the Wasserstein distance is also equal to the L 1 distance between cdfs [35] . It thus clearly generalizes the Kolmogorov-Smirnov statistic.

We will see in the empirical part of this paper that this generalization may be more appropriate than the Kolmogorov-Smirnov statistic to assess the significance of the variations of the distribution. Indeed, the occurrence of several extreme observations may not impact significantly the Kolmogorov-Smirnov statistic, which mainly focuses on the body of the distribution, whereas the Wasserstein distance takes into account the whole distribution. On the other hand, since a uniform distribution is not subject to a dichotomy between body and tails, the Kolmogorov-Smirnov statistic seems appropriate for assessing the uniformity of the PITs.

As opposed to the other divergences, the Kullback-Leibler divergence is not strictly speaking a distance function as it is not symmetric in the two densities. It is related to Shannon's entropy. It is defined by:

All these divergences can be generalized easily if we work with a discrete grid instead of R. They are also always positive and with a value of zero iff t =f t0 .

The purpose of this simulation study is to confirm with simulated data the relevance of the selection method for h and ω, introduced in equation (5). We thus introduce an alternative selection criterion and compare the performance of the two methods in forecasting a simple simulated time-varying density.

The alternative selection method we use, in this time-varying framework, is the one proposed by Harvey and Oryshchenko. They use a maximum likelihood approach to select the free parameters h and ω [19] :

We now compare the two methods defined by equations (5) and (7). It appeared to us that our approach performs well when the simulated distribution has fat tails. Therefore, we generate 2, 000 independent variables of time-varying density f t , which is a Cauchy density with scale parameter equal to 1 and time-varying location parameter equal to t/100:

Incidentally the t-th variable equivalently follows a Student's distribution with one degree of freedom, translated by a quantity t/100.

The particularity of this kind of distribution is the frequent occurrence of high values, possibly very far from all the past observations. Likelihood-based selection methods of the free parameters of the kernel density thus require a consequent smoothing, that is a high h and a high ω, in order to avoid very low likelihoods for these big observations. On the contrary, a method based on a cdf criterion, such as the one we put forward, does not have a similar constraint. As a consequence, using the t first simulated data to estimate the density at time t + 1, for t evolving between 1, 000 and 1, 999, we find smaller free parameters in the PIT-based approach than in the likelihood method: h = 0.488, ω = 0.902, h HO = 0.596, and ω HO = 0.989.

Since we know the true density, we can compare the estimated densities in t with f t+1 . We see for example in Figure 1 that the likelihood method leads to a smoother density whose bulk is also more shifted, because of the time-varying aspect, than the PIT-based approach. The divergence statistics introduced in Section 2.5 reveal that our method, which explicitly uses a Kolmogorov-Smirnov criterion, leads to a better estimate of the time-varying density than the likelihood method, but only for the Kolmogorov-Smirnov statistic, as one can see in Figure 2 . This suggests that our PIT-based approach could benefit from an adaptation to other metrics than the sole Kolmogorov-Smirnov criterion, depending on the preferred divergence statistic. This could be the subject of further research.

One could wonder if the relative superiority, at least regarding the Kolmogorov-Smirnov statistic, of the PIT-based approach over the likelihood method also holds when the true density is static. We thus simulate 2, 000 i.i.d. Cauchy variables, with static density equal to f 0 . We estimate a static kernel densityf h and cdfF h , like in equation (1), using only the 1, 000 first observations. This estimate is based on a bandwidth h that we select following either a PIT validation criterion, Like in the dynamic framework, the PIT-based method leads to a less smooth density than the likelihood method: h ,static = 0.287 and h HO,static = 1.449. All the divergence statistics with respect to the true density underline the superiority, in this case, of the PIT-based approach: 0.027 (0.093 for the likelihood method) for the Kolmogorov-Smirnov statistics, 0.113 (0.147) for the Hellinger distance, 0.513 (0.890) for the Wasserstein distance, and 0.032 (0.076) for the Kullback-Leibler divergence.

We now apply the PIT-based method to the estimation of time-varying densities of several stock indices. We consider American indices (NASDAQ Composite, S&P 500, S&P 100), European indices (EURO STOXX 50, Euronext 100, DAX, CAC 40), and Asian indices (Nikkei 225, KOSPI, SSE 50), with a particular focus on S&P 500, EURO STOXX 50, and the South-Korean KOSPI indices. We have used data from Yahoo finance in the time interval from 04/17/2015 to 05/28/2020. The study period includes the economic crisis related to the COVID-19. In particular, we study the impact of the COVID-19 on three stock markets corresponding to economic areas with different crisis management regarding the pandemic. The questions we want to answer are about the significance of this impact and the characterization of a recovery after the peak of the crisis.

We have estimated daily a pdf of daily price returns from the date t 0 corresponding to November 1st, 2019. These densities include observations from 2015, exponentially weighted with an optimal discount factor depending on the index. We provide these optimal discount factors in Table 1 , along with the optimal bandwidth, determined by equation (5), as well as the constrained version defined by equation (6). For the rest of the empirical study, we consider a common pair of parameters for all the indices, so as to make fair comparisons. Focusing on the constrained case reported in Table 1 , we choose the highest estimated bandwidth, to ensure robustness of the densities, and the lowest discount parameter, so as to have the highest responsiveness of the dynamics of densities: h m = 0.012 and ω m = 0.955. We remark that these values are close to the median values obtained for an alternative method described in Appendix A.

For the S&P 500, EURO STOXX 50, and KOSPI indices, we display in Figure 3 the estimated dynamic pdf of price returns at four dates which illustrate the chronology of the impact of the pandemic on financial markets: before the crisis: on the 16th December 2019, in a period where the markets were steady, at the first turmoil in the markets, the 7th February 2020, at the peak of the pandemic, which occurs at a different date for each market, at the end of our sample, the 28th May 2020.

We determine the date of the peak of the pandemic as the date t maximizing the Hellinger distance of the densityf h m ,ω m t with respect to the estimated pdf in t 0 , that isf h m ,ω m t0 . This peak does not follow an epidemiological definition, since we only observe financial data. It corresponds to a maximal divergence with the steady state of the market.

For the KOSPI and the EURO STOXX 50, the pdf before the crisis looks like a Gaussian distribution, with thin tails. Then the pdf slightly widens on the losses side. At the peak of the crisis, the pdf crushes, with very fat tails. After the peak, it tends to an asymmetric distribution with a negative skewness and slowly decreasing tails. The chronology is similar for the S&P500, except that the pdf the 7th February is similar to the one before the crisis. It may indicate a low responsiveness of the US market in front of the outbreak. Or it may denote a temporary lag in the impact of the COVID-19 on the US market, reflecting the lag in the spread of the outbreak in the region.

Displaying pdfs at several dates as in Figure 3 makes it possible to depict the chronology of the crisis. But it is limited since displaying this density every day of our sample would make the figure unreadable. Therefore, instead of displaying each pdf, we display one statistic per day. This statistic must reflect the divergence of the pdf with respect to a steady state of markets. We thus determine the Kolmogorov-Smirnov statistic, the Hellinger distance, the 1-Wasserstein distance, as well as the Kullback-Leibler divergence of the pdf each day with respect to the pdf in t 0 . Results are displayed in Figure 4 . Whatever the divergence statistic, we observe first a slight increase from 0 toward a low positive value until the beginning of the crisis, where the divergence sharply increases till the peak where it begins to slowly decrease. This last phase corresponds to the slow recovery of the markets after the crisis. In addition to the evolution through time of the four divergence statistics, Figure 4 shows confidence intervals for each statistic. These confidence intervals come from the simulation of 10,000 Brownian motions on which we apply our method of density estimation and implementation of divergence statistics. The null hypothesis H 0 is thus that all the price returns are iid Gaussian variables. For each statistic and each date, we represent the quantile estimated on simulations and corresponding to three confidence levels: 95%, 99% and 99.9%. At a given date t, for a given stock index, if the divergence of the current pdf with respect to the pdf in t 0 is above a particular curve of the confidence interval, we reject H 0 with the corresponding confidence level p. In other words, we consider the pdf in t to be significantly different from the pdf in t 0 with a confidence level p.

Depending on the divergence considered, we are able to determine the peak of the impact as the date maximizing the statistic. We display in Table 2 the date of the peak as well as the value of the divergence at the peak, before the crisis, and late May in the Hellinger approach. According to this table, the strongest impact is in the US but the recovery seems faster there than in Europe. The smallest impact and the fastest recovery is by far in China. The peak occurs between the 25th March (China and South-Korea) and the 6th April (US and Japan), whatever the index, except for the DAX, whose peak is in May. We use the other divergences as a robustness check of these results. The conclusions are in fact similar: small impact and almost total recovery late May for the Chinese SSE 50 index, strongest impact on the US market, slowest recovery on the European market. We also observe some variations in the estimation of the peak date. The most surprising one is provided by the Kolmogorov-Smirnov statistic, according to which the peak is reached first in Europe the 18th March, before continental Asia and US the 23rd March, and finally Japan the 2nd April. Table 2 : Hellinger distance H with respect to t 0 . The peak corresponds to when the maximal Hellinger distance is reached. Dates are in 2020.

We stress the fact that the alternative chronology of the peaks is not the only particularity of the Kolmogorov-Smirnov statistic with respect to the three other divergence statistics we have implemented. For instance, the significance of the financial crisis in some regions is questionable according to this divergence statistic, as one can see in Figure 4 . We can nevertheless explain this striking, and certainly dubious, conclusion. Indeed, when simulating two sets of iid random variables, we get two kernel densities but the Kolmogorov-Smirnov statistic focuses on only one quantile, generally corresponding to where the cdf is the steepest, that is in the body of the distribution. If we disrupt one of these two densities with a limited number of outliers, the Kolmogorov-Smirnov statistic may not change a lot as this modification mainly impacts the tails and not the body of the distribution. On the contrary, the three other divergence statistics are less robust to outliers as they are defined by integrals over all the distribution. Their responsiveness to a crisis is thus higher. For this reason, we prefer them to the Kolmogorov-Smirnov statistic for assessing the significance of the variations of a dynamic pdf.

In this paper, we have introduced a new method to select the two free parameters of a dynamic kernel density estimation, namely the discount factor and the bandwidth. This method relies on the maximisation of the accuracy of the daily pdf. This accuracy is to be understood in the sense of the literature about density forecast evaluation: the PIT of each new observation, expressed using the time-varying distribution, forms a set of variables which must be iid uniform variables. We use the Kolmogorov-Smirnov statistic and a discrepancy statistic to build a quantitative criterion of accuracy of the pdf. It is this criterion that we try to maximize when selecting the bandwidth and the discount factor of our time-varying pdf. Future research could focus on extensions of this method to other divergence statistics.

We have applied this method to financial data. In particular, we represent the evolution of the pdf of daily price returns for several stock indices during the COVID-19 pandemic. We are thus able to expose an accurate chronology of the financial crisis. Though the impact of the pandemic on the Chinese market seems limited, we observe that the strongest impact occurred in the US. The slowest recovery is in Europe, for which the pdf of daily returns is still significantly different from a steady market late May 2020. On the contrary, the recovery of the Chinese and South-Korean markets is very rapid. According to several divergence statistics late May 2020, they are even not significantly different from what they were before the crisis.

The selection of the free parameters h and ω relies on the minimization of a criterion d ν exposed in Section 2.4. Alternatively to this criterion, we propose another criterion D ν to be minimized. The difference between d ν and D ν consists in a different interpretation of the independence of the PITs.

In D ν , one simply considers a necessary condition of the independence, what makes this approach less rigorous than d ν . Given the statistic k defined in equation (4), the criterion D ν follows the definition:

The rationale behind this alternative criterion is that the uniformity must be a feature not only of the PITs in the interval [t 0 + 1, T ] but also of the PITs in any of its subintervals: the sequence must be equidistributed. Like for d ν , this method will avoid almost static densities with price returns globally well distributed, but with high PITs in a bullish regime compensated by low PITs during a crisis period. The criterion D ν to be minimized is then the worst uniformity statistic over all the subintervals of [t 0 + 1, T ]. Once again, this criterion adapts the Kolmogorov-Smirnov statistics to the size of the subsample [29] . We also choose a minimal size ν above which we consider that the asymptotic Kolmogorov distribution may be applied. We may consider, for example, ν = 22, so that we verify the uniformity for every one-month interval of daily price returns. Thanks to this size-adapted statistic, the criterion D ν in fact focuses on the subinterval of size higher than ν with the least uniform PITs.

The optimal parameters h and ω obtained by this method for the stock indices studied in the empirical part of this paper are gathered in Table 3 . Table 3 : Optimal bandwidth h and discount factor ω minimizing the criterion D ν for several stock indices for densities between November 2019 and May 2020. The constrained version is h c and ω c .

Efficiency of the financial markets during the COVID-19 crisis: time-varying parameters of fractional stable dynamics. working paper

Forecasting the effect of COVID-19 on the S&P500. working paper

The unprecedented stock market impact of COVID-19

Minimum Hellinger distance estimates for parametric models

Testing density forecasts, with applications to risk management

Nonparametric density estimation for multivariate bounded data

An algorithm for nonparametric GARCH modelling

Verification of internal risk measure estimates

An asymptotic decomposition for multivariate distribution-free tests of independence

Evaluating density forecasts, with applications to financial risk management

Evaluating density forecasts of inflation: the survey of professional forecasters

Efficient estimation of conditional variance functions in stochastic regression

Bridging the gap between the distribution of realized (ECU) volatility and ARCH modelling (of the Euro): the GARCH-NIG model

Estimation of time-dependent Hurst exponents with variational smoothing and application to forecasting foreign exchange rates. Physica A: statistical mechanics and its applications

Non-parametric news impact curve: a variational approach

Probability density of the empirical wavelet coefficients of a noisy chaos

Journal of the royal statistical society: series B (statistical methodology

l p -discrepancy and statistical independence of sequences

Kernel density estimation for time series data

The role of the information set for forecasting -with applications to risk management

A brief survey of bandwidth selection for density estimation

A multivariate Kolmogorov-Smirnov test of goodness of fit

Multivariate density forecast evaluation: a modified approach

Multivariate online kernel density estimation with Gaussian kernels

Statistiques d'ordre supérieur pour le traitement du signal

Collective behavior of equity returns and market volatility

Testing multivariate uniformity and its applications

Simulated minimum Hellinger distance estimation for some continuous financial and actuarial models

Evaluating Kolmogorov's distribution

Parametric return density estimation for reinforcement learning. working paper

Low-discrepancy and low-dispersion sequences

Recent constructions of low-discrepancy sequences. Mathematics and computers in simulation

Moments expansion densities for quantifying financial risk

Alternative models for conditional stock volatility

Statistical aspects of Wasserstein distances. Annual review of statistics and its application

Regression approach for modeling COVID-19 spread and its impact on stock market. working paper

Remarks on a multivariate transformation

Density estimation using inverse and reciprocal inverse Gaussian kernels. Nonparametric statistics

A brief survey on the choice of parameters for

Density estimation for statistics and data analysis

The stable non-Gaussian asset allocation: a comparison with the classical Gaussian approach

Introduction to nonparametric estimation

On the use of low discrepancy sequences in Monte Carlo methods. Monte Carlo methods and applications

Topics in optimal transportation

Kernel smoothing

Improved parameter estimation of time dependent kernel density by using artificial neural networks

Remarks on some recursive estimators of a probability density

Time-varying nonlinear regression models: nonparametric estimation and model selection