key: cord-0586988-mr4oqzyt
authors: Bernardini, Davide; Paterlini, Sandra; Taufer, Emanuele
title: A 2-stage elastic net algorithm for estimation of sparse networks with heavy tailed data
date: 2021-08-24
journal: nan
DOI: nan
sha: 84ae12f0867d403a3654ef1ec5097f5b8f1052ba
doc_id: 586988
cord_uid: mr4oqzyt

We propose a new 2-stage procedure that relies on the elastic net penalty to estimate a network based on partial correlations when data are heavy-tailed. The new estimator allows to consider the lasso penalty as a special case. Using Monte Carlo simulations, we test the performance on several underlying network structures and four different multivariate distributions: Gaussian, t-Student with 3 and 20 degrees of freedom and contaminated Gaussian. Simulation analysis shows that the 2-stage estimator performs best for heavy-tailed data and it is also robust to distribution misspecification, both in terms of identification of the sparsity patterns and numerical accuracy. Empirical results on real-world data focus on the estimation of the European banking network during the Covid-19 pandemic. We show that the new estimator can provide interesting insights both for the development of network indicators, such as network strength, to identify crisis periods and for the detection of banking network properties, such as centrality and level of interconnectedness, that might play a relevant role in setting up adequate risk management and mitigation tools.

An undirected graphical model is a set of two elements: a joint probability distribution f and a graph G, which encodes the conditional dependence structure of a set of random variables (see Lauritzen [24] , Koller and Friedman [22] ). So far, most of the existing literature has focused on Gaussian graphical models, where the joint distribution is multivariate Gaussian N p (µ, Σ). In such a set-up, the conditional dependence structure, and thus the graph of the Gaussian graphical model, can be retrieved by looking, for example, at the inverse matrix Θ of the covariance matrix Σ. In particular, θ ij = 0 implies conditional independence between variable i and j (see Lauritzen [24] ).

Several methods have been proposed to estimate Θ and thus the conditional dependence graph. Typically, a penalized estimator is used to recover the sparsity pattern and then reconstruct the graph of conditional dependencies (see, for example, Banerjee et al. [3] , Cai et al. [8] , Friedman et al. [17] , Lee et al. [25] , Mazumder and Hastie [27] , Meinshausen and Bühlmann [28] , Yuan [32] , Zhou et al. [33] ). A widely used penalty is the Least Absolute Shrinkage and Selection Operator (LASSO) penalty proposed by Tibshirani [29] and based on 1 -norm.

For example, Meinshausen and Bühlmann [28] introduced a conditional LASSO-penalized regression approach.

Friedman et al. [17] proposed the graphical LASSO (glasso) algorithm. It adds an element-wise 1 -norm penalty to the multivariate Gaussian log-likelihood function in order to estimate a sparse Θ. More recently, Bernardini et al. [6] and Kovács et al. [23] introduced independently penalized estimators of the precision matrix based on the elastic net penalty (see Zou and Hastie [34] ). Bernardini et al. [6] suggested three different alternative approaches and investigate their performance through simulations, while Kovács et al. [23] focused on a direct modification of the glasso algorithm augmented with target matrices. The elastic net penalty extends the LASSO penalty by combining together 1 -norm (LASSO) and 2 -norm. Thus, LASSO is a special case of the elastic net penalty when the weight given to 2 -norm is zero.

When the distribution of data is not Gaussian, these estimators may lead to poor estimates because the assumption about the underlying distribution is violated. Thus, extensions to other multivariate distributions can be useful to address such situations. This paper extends the 2-stage graphical elastic net estimator (2Sgelnet) in Bernardini et al. [6] to the case where the joint distribution is a multivariate t-Student. The goal is to estimate the sparse precision matrix from which then retrieve the sparse partial correlation matrix. In order to achieve it, we propose a modification of the tlasso algorithm by Finegold and Drton [15] , introducing then the 2-stage t-Student elastic net (2Stelnet) estimator, which relies on the Expectation-Maximization (EM) algorithm (see Dempster [12] ), by exploiting the scale-mixture representation of the multivariate t-Student. As for 2Sgelnet, this new proposed estimator includes both LASSO and elastic net cases since the former is a special case of the latter.

Note that for t-Student distribution, a zero element in Θ does not necessarily imply conditional independence, but only a zero partial correlation. Thus a partial correlation matrix is estimated. Monte Carlo simulations show that the proposed extension (2Stelnet) to the t-Student distribution of 2Sgelnet of Bernardini et al. [6] leads, in general, to better estimates in a wide range of situations where the data exhibit heavier tails than in the normal case and in presence of model misspecification. Fat tails are, for example, a well-known stylized fact of financial time series (see Cont [10] ). Furthermore, economics and finance have recently begun to pay more attention to the importance of several network structures (see Jackson [20] , Carvalho and Tahbaz-Salehi [9] , Diebold and Yilmaz [14] , Jackson and Pernoud [21] , Bardoscia et al. [4] ). There is, consequently, a growing interest in broadening the set of tools available to estimate network structures. For example, financial networks have received a renewed attention in the aftermath of the recent global financial crisis. Several empirical studies have focused their attention on the estimation and analysis of these networks; see, for example, Barigozzi and Brownlees [5] , Bilio et al. [7] , Dermier et al. [11] and Torri et al. [30] . In this paper, we utilize 2Stelnet to estimate the relationships among a large set of important European banks, for the period 2018-2020. In order to improve our understanding of systemic risk of European banking network and assess the impact of Covid-19 pandemic, we track some common network statistics and the evolution of the average intensity of the connections. We also identify the most central banks by looking at different centrality measures. Finally, we carry out an exploratory simulation study to map the effects of shock in the estimated network. This paper is organized as follows. Section 2 describes the 2Stelnet estimator. Section 3.1 describes the setup of our simulation analysis and Section 3.2 reports the results obtained. Finally, in Section 4, we estimate the European banking network during the period of 2018-2020 and analyze its characteristics.

Let X = [X 1 , ..., X p ] be a p-dimensional random vector with joint multivariate t-Student distribution t p (µ, Ψ −1 , ν)

where µ is the mean vector, Ψ −1 is the positive definite scatter, or dispersion, matrix and ν are the degrees of freedom. The covariance matrix Σ of X and its inverse, the precision matrix Θ, are then equal to Σ = ν ν−2 Ψ −1 and Θ = ν−2 ν Ψ, with ν > 2. Our goal is to estimate a sparse precision matrix Θ, from which to retrieve a sparse graph whose weights are partial correlations between couples of variables. Partial correlations are obtained by properly scaling the off-diagonal elements in Θ. Let θ jk be the element in the j-th row and k-th column of Θ, the partial correlation p jk between components X j and X k is then equal to:

Thus, we build a graph G(V, E) with the set of nodes V={1, ..., p} representing the elements in X and the set of edges E ⊆ V×V and (j, k) ∈ E if p jk = 0, where (j, k) represents the edge between elements X j and X k .

Note that, differently from Gaussian graphical models where the distribution of X is multivariate Gaussian, here p jk = 0 does not necessarily imply that elements X j and X k are conditionally independent (see Baba et al. [2] ). Nonetheless, it can be shown that, in this case, if nodes j and k are separated by a subset of nodes C ⊆ {h | h ∈ V ∧ h = j, k} in the graph G, then the elements X j and X k are conditionally not correlated given the subset C (see Finegold and Drton [15] ). For elliptical distributions, conditional and partial correlations are equivalent. (see Baba et al. [2] ).

Following the scale-mixture representation of a multivariate t-Student distribution as in Finegold and Drton [15] , we have that:

where Y ∼ N p (0, Ψ −1 ) and τ ∼ Γ ν 2 , ν 2 . Thus, the multivariate p-dimensional t-Student distribution can be seen as a mixture of a multivariate p-dimensional Gaussian distribution with an independent univariate gamma distribution. By properly exploiting this representation, it is possible to rely on the EM algorithm (Dempster et al. [13] ) to estimate the parameters of the multivariate t-Student distribution. Following closely the tlasso procedure proposed by Finegold and Drton [15] , we suggest a similar EM algorithm to produce a sparse estimate of Θ.

Differently from the tlasso that uses LASSO, or 1 -norm, penalty (see Tibshirani [29] ) to induce sparsity, we

propose an approach that utilizes the elastic net penalty, a linear combination of 1 -norm and 2 -norm [34] , to do a penalized estimation of Θ. In fact, we use the 2Sgelnet by Bernardini et al. [6] , instead of relying on glasso by Friedman et al. [17] . The core idea behind is to estimate a sparse Ψ, the precision matrix of the multivariate Gaussian in the mixture, since its elements are proportional to the elements of Θ.

Let x 1 , ..., x n be n p-vectors of observations drawn from t p (µ, Ψ −1 , ν) distribution, realizations of X. The random variable τ in the mixture (2) is considered the hidden, or latent, variable whose value is updated given the current estimate of the parameters and the observed data. Let also τ i be the value of the latent variable τ , associated with observation x i . As in the tlasso of Finegold and Drton [15] , we also assume that the degrees of freedom ν are known in advance. This simplifies the procedure, but ν can be treated also as an unknown parameter and thus estimated (see Liu and Rubin [26] ). The EM algorithm proceeds as follows [15] . At time step t + 1:

• Expectation step (E-step)

-The expected value of τ , given a generic vector of observations x and parameters µ, Ψ and ν, is:

-Using the current estimatesμ (t) andΨ (t) of the parameters, we update the estimated valueτ

for each i = 1, ..., n.

• Maximization step (M-step)

-Compute the updates of parameters given the data andτ

whereΨ α,λ (.) is a penalized estimator of Ψ, whose penalty is controlled by hyperparameters α and λ

The EM algorithm cycles sequentially through the E and M steps until a convergence criterion is satisfied.

Letψ jk be the element in j-th row and in k-th column ofΨ, we stopped the iterations in the algorithm when

jk |) is smaller than a given threshold value δ.

In the following, we discuss the estimatorΨ α,λ (.) of Ψ for (6) based on 2Sgelnet of Bernardini et al. [6] .

This estimator consists in a 2-step procedure. At first, it estimates the sparsity structure ofΨ (t+1) by using conditional regressions with elastic net penalty. This is inspired by the neighborhood selection approach of Meinshausen and Bühlmann [28] where the LASSO penalty is used. At each iteration, we first transform the observed data as follows (from (2)):

Let X be the n by p matrix of transformed observations, such that the i-th row is equal to x i . Let also X k be the k-th column of X and X −k be X without the k-th column. We fit p elastic net penalized regressions using the k-th component of the transformed vectors as dependent variable and the remaining components as predictors:

with k = 1, ..., p. As in Meinshausen and Bühlmann [28] , we reconstruct the graph representing the connections (p jk = 0) among the components of X (and also of Y ). We include in the neighborhood of the node k the node j if the corresponding coefficient of the component j inb k is different from 0. Then, through the reconstructed neighborhoods of all nodes, we produce an estimateÊ of the edge set E. This procedure can lead to the situation where an edge (j, k) is included inÊ according to the neighborhood of j, ne(j), but not accordingly to the neighborhood of k, ne(k). To deal with such a situation, we can use two rules (see Meinshausen and Bühlmann [28] ):

Once estimatedÊ, we use it to set the zero elements constraints in the current updateΨ (t+1) . In particular, the up-dateΨ (t+1) is the maximizer of the following constrained optimization problem, withŜ

This problem can be rewritten in Lagrangian form as:

where γ ij are Lagrange multipliers having nonzero values for all ψ ij constrained to 0. To solve this optimization problem (9), we use the algorithm proposed by Hastie et al. [19] (Section 17.3.1, pp. 631-634) to maximize the constrained log-likelihood and produce an estimateΨ, given the edge setÊ estimated in the previous step. As with 2Sgelnet, the existence of this estimator of Ψ is not guaranteed for all situations. When the number n of observations is greater than the number p of nodes, the estimator always exists (see Lauritzen [24] , Section 5.2.1).

In the other situations, its existence depends on the structure of the estimated edge setÊ (see Lauritzen [24] , Section 5.3.2). In Appendix A we reported a pseudo-code for the 2Stelnet algorithm.

We rely on simulations to assess the performances of 2Stelnet (i.e. ν = 3 fixed a priori) and compare it with 2Sgelnet [6] and with the well-known glasso [17] and tlasso [15] algorithms. For sake of brevity, we report only the results obtained with AND rule as they are qualitatively similar to the ones obtained using OR rule. There are some specific situations where one rule is better than the other, but we do not find an overall winner. Results for OR rule are available upon request.

We consider the values 0.5 and 1 for α, while for λ we consider 100 exponentially spaced values between e −6 and 2 (6.5 for tlasso). We search for the optimal value of λ, given a value of α for 2Sgelnet and 2Stelnet, using BIC criterion (see [16] ).

We randomly generate seven 50 × 50 precision matrices encoding different relationship structures among 50

variables. We consider the following seven network topologies: scale-free, small-world, core-periphery, random, band, cluster and hub. Precision matrices embedding the randomly generated structures are reported in Figure 8 of Appendix B. For scale-free, random, band, cluster and hub we use the R package huge (setting v = 0.3, u = 0.1) that allows to directly generate precision matrices with a given sparsity structure. Instead, for the small-world and core-periphery, we use the algorithms in [31] and [30] respectively to generate the sparsity pattern, then we utilize the procedure suggested in huge package to produce the precision matrices. Given a precision matrix Θ, we use it to generate n vectors of observations from the following four multivariate distributions:

• Multivariate normal: N 50 (0, Θ −1 ).

• Multivariate t-Student: t 50 (0, ν−2 ν Θ −1 , ν), with ν = 20.

• Contaminated normal:

with Ber ∼ Bernoulli(pd = 0.85).

We consider three sample sizes, n = 100, 250, 500. Thus, for each of the 14 couples network-distribution, we have 100 × 50, 250 × 50 and 500 × 50 datasets, respectively. We set up a Monte Carlo experiment with 100 runs for each sample size n.

In order to compare the estimators in different settings, we test them on all couples network-distribution looking both at classification performance and numerical accuracy of the estimates for optimal λ. We use the F 1 -score = 2 · Precision·Recall Precision+Recall as a measure of classification performance, where:

• Precision = A true positive is a correctly identified edge, while a true negative is a missing edge which is correctly excluded from the edge set. A false positive is a missing edge which is erroneously included in the edge set, while a false negative is an edge which is wrongly excluded. The closer this measure is to 1, the better the classification. F 1score is a better measure than accuracy when there is imbalance among classes (here existence or absence of an edge), as it happens in our simulation set-up.

In order to assess the numerical accuracy of the estimates, we compute the Frobenius distance between the theoretical and estimated partial correlation matrices, P andP respectively:

These matrices can be easily obtained through a proper scaling of the theoretical and estimated precision matrices, Θ andΘ, as in (1). We follow a common convention and set the diagonal elements p jj =p jj = 0, with j = 1, 2, ..., p. When the Frobenius distance is 0, the estimate is exactly equal to the true values. Thus, the smaller the Frobenius distance, the better the estimate is from the numerical accuracy point of view.

In the following, we report the box plots for the performance measures and compare the estimators. A box plot represents the distribution of the performance measures of the optimal models obtained in 100 Monte Carlo runs for each combination of distribution and network. We test the performance of glasso, tlasso, 2Sgelnet and 2Stelnet.

For the 2-stage estimators, we use the AND rule as a procedure to produce an estimate of the edge set. When α = 1, we have a pure LASSO penalty (see 8) . For the sake of brevity, analyzing the simulation results, we refer to this situation using 2Sglasso and 2Stlasso, while 2Sgelnet and 2Stelnet refers to the case where α = 0.5. and from the performance of tlasso with n = 250, 500. This is due to the fact that the optimal model selected using BIC criterion with the smallest sample size is often the null model (or close to it), that is the models with zero edges or a diagonal precision matrix. The observed behavior suggests that the BIC criterion with tlasso and a small sample is problematic. This is not to the case with the proposed estimator 2Stelnet (and 2Stlasso).

When the distribution of data is multivariate Gaussian (Figure 1 ), we observe that the 2-stage procedures compete with glasso when n = 100. Depending on the underlying network structure, there are situations in which they perform better looking at median values. As n gets larger, the 2-stage estimators tend to have a noticeable higher performance with respect to both glasso and tlasso. We also observe that the estimators based on t-Student distribution (with ν = 3) perform quite close to the ones based on multivariate Gaussian. Also tlasso seems to be better in median terms than the glasso while looking at classification performance. This is a notable fact because it suggests that they are quite robust to this distributional misspecification.

If datasets are randomly generated from multivariate t-Student with ν = 3 ( Figure 2 ) the algorithms based on t-Student outperform the others based on Gaussian distribution, as expected. The only exception is the tlasso when n = 100. The 2Stelnet and 2Stlasso estimators perform the best, followed by tlasso when n = 250, 500. We also observe that 2Sgelnet and 2Sglasso are better estimators than glasso in this case. The relative increase in performance depends on the underlying structure, but we notice that these algorithms based on Gaussian distribution are not that robust when data are from an heavy-tailed distribution (i.e. t-Student distribution, ν = 3).

In general, when the degrees of freedom of the t-Student increase (ν = 20, see Figure 9 in Appendix D), the perfomances of glasso, 2Sgelnet and 2Sglasso are, in terms of median, only slightly worse than the estimators based on t-Student. This is in line with the fact that the t-Student distribution tends to the Gaussian one as the degrees of freedom get larger. Indeed, with this setting, the behaviors of the considered estimators are similar to the ones in Figure 1 . However, note that tlasso and 2Stelnet/2Stlasso are still more robust even if they assume

The last test case is reported in Figure 10 (Appendix D). Here the distribution is a mixture between two multivariate normal distributions with different variances and correlation structures. This is different from both the distributions assumed by all the algorithms considered. Simulation results suggest that tlasso, 2Stelnet and 2Stlasso are much more robust to this misspecification than the counterparts based on Gaussian distribution. The best estimators in this situation are 2Stelnet and 2Stlasso, followed by tlasso if we exclude the problematic case with n = 100 when it performs the worst. Instead, there is not a clear winner between 2Sgelnet/2Sglasso and glasso. In fact, it mostly depends on the underlying topology.

To sum up, we observe that 2Stelnet and 2Stlasso perform quite well in the situations analyzed through simulations, suggesting a good degree of robustness to distributional misspecifications. They also perform quite closely.

This finding indicates that the value of α, and thus the kind of penalty used, is not particularly relevant, at least in the simulation set-up considered. Depending on the network, one value of α can slightly outperform the other in median terms, but differences seem negligible and thus a particular winner does not emerge. Similar considerations hold true for the differences between 2Sgelnet and 2Sglasso. We notice also that some network structures are more difficult to extract. For example, when we look at F 1 -score, the core-periphery topology is the most difficult to estimate in the majority of the situations. Band and cluster structures are also more difficult to retrieve if compared with the remaining ones. Cluster − t−Student (v=3) q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q and estimated partial correlation matrices. The closer is to 0, the better the estimator's numerical accuracy. The observed behavior of tlasso is the consequence of the fact discussed previously: using BIC criterion, the optimal model selected is often the empty, or almost empty, graph. Thus the distribution of this measure tends to be quite skewed for tlasso. In Figure 4 , there are the box plots of the Frobenius distance for datasets from t-Student distribution with ν = 3. In this case tlasso, 2Stelnet and 2Stlasso assume correctly the underlying distribution. As expected, we observe that these procedures outperform the corresponding ones based on Gaussian distribution in general. This is not always the case with tlasso when n = 100 and also with n = 250 and band topology. In almost all cases, 2Stelnet/2Stlasso perform the best. There are only two exceptions with hub structure and n = 100, 250. Instead, 2Sgelnet and 2Sglasso perform the worst in the majority of the situations considered. Only with band network, they are competitive with glasso and tlasso. Thus, the 2-step procedures based on Gaussian distribution are not much robust to model misspecification. Similarly to the situation of classification performance, as the number of degrees of freedom increases, the procedures based on Gaussian distribution achieve similar performances to the ones based on t-Student with v = 3.

We observe that 2Stelnet and 2Stlasso tend to slightly outperform, in median terms, 2Sgelnet and 2Sglasso respectively, while tlasso is outperformed by glasso in most of the cases. Only with hub structure and n = 250, 500, it is slightly better if we look at median values.

Finally, in Figure 10 (Appendix D) we compare the performance of the estimators when data are from a contaminated normal distribution. The procedures that assume multivariate t-Student distribution with v = 3 are much more robust to the model misspecification than the counterparts based on multivariate Gaussian distribution.

There is only one exception with n = 100, where tlasso performs similarly to glasso. We observe that, for the largest sample size considered (n = 500), 2Stelnet/2Stlasso are always the most numerically accurate estimators.

This is also true for n = 250, with only one exception with the hub structure. For the smallest sample size (n = 100), they are the best estimators in four out of seven networks (i.e. random, cluster, band, small-world). When comparing glasso and 2Sgelnet/2Sglasso there is not a clear winner in the cases analyzed. We only observe that the 2-stage procedures tend to improve their relative performances as n grows. With n = 500, they outperfom glasso in the majority of cases. Still, they show worse performance than tlasso in almost all the situations considered.

Only with n = 100 and band topology, they have comparable performance in terms of median.

In conclusion, similarly to classification performance, 2Stlenet/2Stlasso show relatively good performances, especially with larger sample sizes. They are robust to different distributional misspecifications. Again, a winner between elastic net penalty and pure LASSO penalty in the 2-stage estimators does not emerge from our simulations, neither for the ones based on Gaussian distribution, nor for the ones based on t-Student. In general the performances observed are quite similar, only in few situations one penalty slightly outperforms the other looking at median values: which one is the best depends on sample size and on network.

Inspired by the work of Torri et al. [30] , we use the 2Stelnet estimator to reconstruct the European banking network in the period 2018-2020 by using daily stock prices of 36 large European banks. The period considered includes the Covid-19 pandemic, thus it is also possible to asses its impact on the evolution of this banking network and extend [30] to include the recent pandemic crisis. To deal with autocorrelation and heteroskedasticity, two common characteristics of financial time series, we fit an AR (1) We select its best value using the BIC criterion. Figure 13 in Appendix E shows the estimated networks using

Fruchterman-Reingold layout [18] . Common networks measures 1 (mean values) are reported in Table 1 , along with the total number of edges detected.

We observe an increase in all the measure from 2018 to 2020. The mean values of eccentricity and distance give us an indication about the velocity of transmission of a shock in the network. The former is the minimum number of steps needed to reach the farthest node in terms of steps from a specific node, while the latter is length of the minimum path between two different nodes in the graph. Both values in all three years are quite small, suggesting that a shock could potentially diffuse pretty quickly in the system. The averages values of individual clustering coefficients (see [31] ) show a tendency of nodes to cluster together and are higher than the case of random graphs are two core characteristics of small-world graphs [31] . The mean value of degree is the average number of banks connected to a bank 2 . In our case we detect, on average, seven banks linked to each bank. Figure 14 in Appendix E shows the degree distributions in all three years, which give us a more detailed pictures of individual degrees.

In 2018 the distribution is more symmetric, while in 2019 and 2020 becomes multi-modal. In fact, there is a large number of banks with a number of connections below and above the average, but few banks with a number of connections around the mean value. Also, in the last column of Table 1 

where e i is the initial vector of shocks and s ∞ i is final steady-state induced by e i . In our simulation, e i is a vector of 0s with 1 in the i-th position corresponding to a positive unitary shock in the i-th bank (from Table 6 , e Nordea = e 31 and e Mt.Paschi = e 24 ). The convergence of the sum in (11) is guaranteed when the spectral radius ofP 2020 is smaller than 1 (see Anufriev and Panchenko [1] ). This is verified for our estimated partial correlation matrixP 2020 . In respectively. That is, the overall effects of a shock are more than four times larger if it hits the most central bank. 

We introduce a 2-stage estimator that brings together the tlasso of Finegold and Drton [15] and the 2Sgelnet of Bernardini et al. [6] . This procedure is more flexible than 2Sgelnet and it is more suitable for situations when data exhibit heavy tails and model misspecification. The proposed estimator relies on the elastic net penalty and also allows to consider the LASSO penalty as a special case. Exploiting the scale-mixture representation of the multivariate t-Student distribution, we use the EM algorithm to estimate the precision matrix Θ. A sparse estimate is produced in the M-step through a 2-stage procedure using elastic net penalty. By running Monte-Carlo simulations, the proposed estimator is compared with glasso, tlasso and 2Sgelnet by looking both at classification performances and numerical accuracy of the optimal models selected using BIC criterion. Seven network topologies and four multivariate distributions have been considered. We observe that 2Stelnet performs quite well with respect to the other estimators considered, especially with the largest sample sizes. The results also suggest that 2Stelnet performs remarkably well even if there is a mismatch between the real distribution of data and the one assumed by the estimator (i.e. t-Student with ν = 3).

Despite the good behavior of 2Stelnet in low-dimensional settings (n > p), severe limitations can arise in high-dimensional situations (n < p) where the existence and uniqueness of the estimator in the second stage (9) is not guaranteed.

Finally, an empirical application is proposed and 2Stelnet is used to estimate the European banking network from the share prices of a large set of European banks. We show the impact of the Covid-19 pandemic on network strength, which is as an indicator of potential crisis periods. Different centrality measures are also used to detect the most central banks in the network. To conclude our empirical analysis, we evaluate the effects of a shock in the most and least central banks, according to strength, by using the 2020 partial correlation network; not surprisingly, we found much larger effects if the shock hits the most central bank, suggesting that the degree of interconnectedness should play an important role in setting up adequate risk management and risk mitigation tools.

A Pseudo-code 2Stelnet

Algorithm 1: 2Stelnet (Section 2.2)

Transform each observation x i , for i = 1, ..., p as in (7) -Estimateb k using p elastic net regressions as in (8) -

Using eachb k , reconstruct an estimated edge set edge setÊ -GivenÊ compute the updated estimateΨ (t+1) as the maximizer of (9) 

Connecting the dots: Econometric methods for uncovering networks with an application to the australian financial institutions

Partial correlation and conditional correlation as measures of conditional independence

Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data

The physics of financial networks

Nets: Network estimation for time series

New estimation approaches for graphical models with elastic net penalty

Econometric measures of connectedness and systemic risk in the finance and insurance sectors

A constrained l1 minimization approach to sparse precision matrix estimation

Production networks: A primer

Empirical properties of asset returns: stylized facts and statistical issues

Estimating global bank network connectedness

Covariance selection

Maximum likelihood from incomplete data via the em algorithm

Financial and Macroeconomic Connectedness: A Network Approach to Measurement and Monitoring

Robust graphical modeling of gene networks using classical and alternative t-distributions

Extended bayesian information criteria for gaussian graphical models

Sparse inverse covariance estimation with the graphical lasso

Graph drawing by force-directed placement. Software -Practice and Experience

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

Networks and economic behavior

Systemic risk in financial networks: A survey

Probabilistic Graphical Models: Principles and Techniques

Graphical elastic net and target matrices: Fast algorithms and software for sparse precision matrix estimation

Graphical Models

Structure learning of Gaussian markov random fields with false discovery rate control. Symmetry, 11

Ml estimation of the t distribution using em and its extensions, ecm and ecme

The graphical lasso: New insights and alternatives

High-dimensional graphs and variable selection with the lasso

Regression shrinkage and selection via the lasso

Robust and sparse banking network estimation

Collective dynamics of 'small-world' networks

High dimensional inverse covariance matrix estimation via linear programming

High-dimensional covariance estimation based on gaussian graphical models

Regularization and variable selection via the elastic-net

 Figure 13 : Estimated partial correlation networks