key: cord-0649472-mck834u0
authors: Kobayashi, Ken; Takano, Yuichi; Nakata, Kazuhide
title: Cardinality-constrained Distributionally Robust Portfolio Optimization
date: 2021-12-23
journal: nan
DOI: nan
sha: 9c5ab235fccd3eff9d95e5fcfd3e4408757790dc
doc_id: 649472
cord_uid: mck834u0

This paper studies a distributionally robust portfolio optimization model with a cardinality constraint for limiting the number of invested assets. We formulate this model as a mixed-integer semidefinite optimization (MISDO) problem by means of the moment-based uncertainty set of probability distributions of asset returns. To exactly solve large-scale problems, we propose a specialized cutting-plane algorithm that is based on bilevel optimization reformulation. We prove the finite convergence of the algorithm. We also apply a matrix completion technique to lower-level SDO problems to make their problem sizes much smaller. Numerical experiments demonstrate that our cutting-plane algorithm is significantly faster than the state-of-the-art MISDO solver SCIP-SDP. We also show that our portfolio optimization model can achieve good investment performance compared with the conventional mean-variance model.

Portfolio optimization models, originating from the mean-variance portfolio selection pioneered by Markowitz [39] , have been actively studied by academic researchers and institutional investors. In real-world situations of portfolio optimization, we must depend on inaccurate estimates of asset returns, which can lead to lower investment performance. Accordingly, robust optimization [5, 6] , which copes with uncertainty about input data, has played an important role in practical portfolio optimization [20, 21, 30] . This paper focuses on the distributionally robust optimization approach to portfolio optimization.

In distributionally robust optimization, optimal solutions are evaluated under the worst-case expectation with respect to a set of probability distributions of uncertain parameters. This optimization model was first introduced by Scarf [47] for an inventory problem. Distributionally robust optimization can be viewed as a framework unifying traditional stochastic optimization based on sample average approximation [33, 48, 49] and conventional robust optimization based on uncertainty sets of possible realizations of random variables [5, 6] . Theoretical analyses also support the effectiveness of distributionally robust optimization in data-driven decision-making problems under uncertainty [27, 43] .

Moment-based uncertainty sets, which contain probability distributions of asset returns whose moments satisfy certain conditions, are commonly adopted in distributionally robust portfolio optimization [24, 26, 37, 41, 45, 51] . El Ghaoi et al. [24] considered the worst-case value-at-risk over an uncertainty set of probability distributions such that the mean and covariance matrix are elementwise bounded. Tütüncü and Koeing [51] used a similar uncertainty set for solving robust mean-variance portfolio optimization problems. Goldfarb and Iyengar [26] studied a distributionally robust portfolio optimization problem in which asset returns are formed using a linear factor model. Popescu [45] assumed that the mean and covariance matrix are exactly known and used an uncertainty set of probability distributions whose first and second moments are equal to them, respectively. Similar assumptions were also made in other distributionally robust optimization models [41, 56] . Goh and Sim [25] studied an uncertainty set of probability distributions whose mean belongs to a convex set and covariance matrix is given. As an extension of the elementwise uncertainty sets [24, 51] , Lotfi and Zenios [37] considered an ellipsoidal uncertainty set for the tuple of the mean and covariance matrix when minimizing worst-case value-at-risk and conditional value-at-risk (see Rahimian and Mehrotra [46] for a comprehensive survey on distributionally robust optimization models).

Unlike these models that require prior knowledge of the region including the mean and covariance matrix, Delage and Ye [19] proposed a data-driven method for defining a moment-based uncertainty set and gave its probabilistic guarantee. They also formulated the worst-case loss minimization problem based on this uncertainty set as a semidefinite optimization (SDO) problem, which can be solved in polynomial time with interior-point methods [42] . Bertsimas et al. [11] proposed a practical bootstrap method to calibrate an uncertainty set of probability distributions from historical data.

We focus on the distributionally robust portfolio optimization model that is based on the moment-based uncertainty set developed by Delage and Ye [19] with a cardinality constraint for limiting the number of invested assets. When the number of invested assets is large, it is difficult for investors to keep track of each asset, and substantial transaction costs are also required [38, 44] . Thus, there is a need to control the number of invested assets with the cardinality constraint. Due to the moment-based uncertainty set and cardinality constraint, our portfolio optimization model is formulated as a mixed-integer semidefinite optimization (MISDO) problem, which is a computationally challenging optimization problem.

Conventional methods for solving MISDO problems are mainly classified into two categories: branch-and-bound and cutting-plane algorithms. Regarding branch-and-bound algorithms, Gally et al. [23] developed a generalpurpose MISDO solver called SCIP-SDP, which combines the branch-andbound framework of SCIP [1] with SDO solvers that use interior-point methods. Branch-and-bound algorithms have been used for specific applications of MISDO models [3, 16, 53] . Regarding cutting-plane algorithms, Coey et al. [18] developed a general-purpose solver called Pajarito for mixed-integer conic optimization problems. Cutting-plane algorithms for solving MISDO problems have been applied to eliminating multicollinearity from linear regression models [50] and allocating surgery blocks to operating rooms [54] . Kobayashi and Takano [34] devised a branch-and-cut algorithm for solving MISDO problems. As shown above, several algorithms have been proposed to solve MISDO problems; however, a specialized algorithm is required to solve large-scale portfolio optimization problems.

Bertsimas et al. [9] proposed a general framework of cutting-plane algorithms to exactly solve mixed-integer convex optimization problems with logical constraints. They reformulated this problem as a bilevel optimization problem composed of lower-and upper-level problems. To solve the upper-level problem, they devised a cutting-plane algorithm, which iteratively approximates the objective function by generating cutting planes from the solution to the lower-level problem on the basis of the strong duality theory. Bertsimas and Cory-Wright [7] applied this algorithm to cardinalityconstrained mean-variance portfolio optimization. They demonstrated that their algorithm was much faster than state-of-the-art MIO methods when solving large-scale problem instances. Kobayashi et al. [35] developed a bilevel cutting-plane algorithm to solve cardinality-constrained mean-CVaR portfolio optimization problems. The cutting-plane algorithm [9] was also applied to sparse principal component analysis [8] and sparse inverse covariance estimation [13] .

Motivated by these prior studies, we propose a specialized cutting-plane algorithm to exactly solve the cardinality-constrained distributionally robust portfolio optimization problem. Each iteration of this cutting-plane algorithm solves an SDO problem, which is the dual of the lower-level problem for generating cutting planes. However, the size of this SDO problem depends on the number of all investable assets, so solving it at each iteration is computationally prohibitive when handling a large number of assets. To overcome this difficulty, we reduce the size of the lower-level SDO problem through the application of the matrix completion technique [22, 40] . Notably, the size of the reduced SDO problem depends not on the number of all investable assets but on the cardinality parameter, which is usually set to a small number. To the best of our knowledge, we are the first to develop an effective algorithm for exactly solving the cardinality-constrained distributionally robust portfolio optimization problem. Numerical experiments using real-world datasets [4, 17, 32, 36] demonstrate the effectiveness of our method in terms of both computational efficiency and out-of-sample investment performance.

The main contributions of the present paper are summarized as follows:

• We formulate the cardinality-constrained distributionally robust portfolio optimization model with the moment-based uncertainty set as an MISDO problem.

• We develop the cutting-plane algorithm for solving the cardinalityconstrained distributionally robust portfolio optimization problem. We also prove that our algorithm outputs a solution with guaranteed global optimality in a finite number of iterations.

• By means of the matrix completion technique [22, 40] , we reduce the size of lower-level SDO problems in our cutting-plane algorithm. Numerical results indicate that our algorithm with the problem reduction is faster than the state-of-the-art MISDO solver SCIP-SDP [23] .

• We show through numerical experiments that our portfolio optimization model is superior to the conventional mean-variance model in terms of out-of-sample investment performance.

The remainder of this paper is organized as follows. In Section 2, we give an MISDO formulation of the cardinality-constrained distributionally robust portfolio optimization model. In Section 3, we present our cutting-plane algorithm for solving the problem. In Section 4, we describe the reduction of the lower-level SDO problem. We report the numerical results in Section 5 and conclude in Section 6.

Notation. The set of consecutive integers ranging from 1 to N is denoted as [N] := {1, 2, . . . , N}. The zero and all-ones vectors of appropriate sizes are written as 0 and 1, respectively. The zero and identity matrices of appropriate sizes are written as O and I, respectively. The Diag(·) operator maps the vector to the diagonal matrix. The set of all N × N real symmetric matrices is denoted as S N . For X, Y ∈ S N , we write X ≻ Y and X Y if the matrix X −Y is positive definite and positive semidefinite, respectively. The standard inner product of matrices A := (A nm ) ∈ S N and B := (B nm ) ∈ S N is defined as A • B := n∈[N ] m∈[N ] A nm B nm . The Hadamard product of vectors a := (a n ) ∈ R N and b := (b n ) ∈ R N is defined as a•b := (a n b n ) ∈ R N .

In this section, we formulate the cardinality-constrained distributionally robust portfolio optimization model that we consider in this paper as an MISDO problem.

Let x := (x 1 , x 2 , . . . , x N ) ⊤ be a portfolio, where x n is the investment weight of the nth asset. Throughout this paper, we consider the set of feasible portfolios:

The nonnegativity constraint on x prohibits short selling. Let k ∈ [N] be a user-defined parameter for limiting the cardinality (i.e., the number of assets to be held). We then impose the following cardinality constraint [10, 14, 44] on portfolio x:

where · 0 is the ℓ 0 -norm (i.e., the number of nonzero entries). In practice, this constraint is required by investors to reduce their portfolio monitoring and transaction costs. Let z := (z 1 , z 2 , . . . , z N ) ⊤ be a vector composed of binary decision variables for selecting assets, that is, z n = 1 if the nth asset is selected, and z n = 0 otherwise. We also introduce the feasible set corresponding to the cardinality constraint (1):

The cardinality constraint (1) is then represented by the logical implication:

We consider a measurable space (Ω, F ). Letξ : Ω → R N be an Fmeasurable function (N-dimensional random vector) representing the rate of random return of each asset. Suppose that ξ m ∈ R N is an observed historical return for each period m ∈ [M]. The sample mean vector and sample covariance matrix are then calculated aŝ

Let M be the set of all probability measures in the measurable space (Ω, F ), and E F [ · ] be the expectation under the probability measure F ∈ M. Delage and Ye [19] considered the moment-based uncertainty set of probability distributions ofξ. When the support of probability distributions is R N , this uncertainty set is given by

where κ 1 ≥ 0 and κ 2 ≥ 1 are user-defined uncertainty parameters aboutμ andΣ, respectively. The first condition in Eq. (3) ensures that the expectation ofξ lies in an ellipsoid of size κ 1 centered atμ. The second condition in Eq. (3) indicates that the second central-moment matrix ofξ is bounded above by κ 2Σ in the sense of the matrix inequality.

We make the following mild assumption about the moment-based uncertainty set (3).

Let ξ ∈ R N be a realization of the random returnξ. As in Delage and Ye [19] , we use the following piecewise-linear concave utility function in the portfolio net return y = ξ ⊤ x:

where a (ℓ) , b (ℓ) ∈ R are respectively the slope and intercept of the ℓth linear function for ℓ ∈ [L]. The loss function is then defined as the negative of the utility function (4):

Our objective is to minimize the following worst-case expected loss with respect to an underlying probability distribution F ∈ D(μ,Σ, κ 1 , κ 2 ) ofξ:

The ℓ 2 -regularization term x ⊤ x is also incorporated into the objective from the perspective of robust optimization [7, 20, 28, 29] . We formulate the cardinality-constrained distributionally robust portfolio optimization model as follows:

where γ > 0 is a user-defined regularization parameter. By deriving the dual of the inner maximization problem (5) as in Delage and Ye [19] , Problem (6) can equivalently be reformulated as the MISDO problem:

where P , Q ∈ S N , p, q ∈ R N , and r, s ∈ R are dual decision variables of the inner maximization problem (5).

In this section, we present our cutting-plane algorithm for solving the cardinality-constrained distributionally robust portfolio optimization problem (7).

We extend the method of bilevel optimization reformulation [7] to our portfolio optimization problem (7) . Let us denote by Z := Diag(z) a diagonal matrix whose diagonal entries are given by z. The logical implication (7e) then amounts to replacing x with Zx in Problem (7). We now can reformulate Problem (7) as a bilevel optimization problem. Specifically, the upper-level problem is written as the integer optimization problem:

and the lower-level problem for calculating the objective function is expressed as the semidefinite optimization problem:

Note that (Zx) ⊤ Zx has been replaced by x ⊤ x in Eq. (9a) because x = Zx holds through minimizing x ⊤ x.

The following theorem yields the dual formulation of Problem (9).

Theorem 2. For all z ∈ Z k N , the strong duality holds for Problem (9) , and the dual formulation of Problem (9) is represented as follows:

where ω ∈ R N , B := (B (ℓ) ) ∈ R N ×N ×L , β := (β (ℓ) ) ∈ R N ×L , η := (η (ℓ) ) ∈ R L , λ ∈ R N , and π ∈ R are dual decision variables.

Proof. See Appendix A.

In accordance with Theorem 2, we can extend the definition of f (z) to the optimal objective value of Problem (10) for real-valued z ∈ [0, 1] N . As in Bertsimas and Cory-Wright [7] , the function f (z) is convex in z ∈ [0, 1] N , and its subgradient for z ∈ {0, 1} N is given by

where ω ⋆ (z) is an optimal solution of ω to Problem (10) . Therefore, Problem (10) withẑ ∈ {0, 1} N provides the following linear underestimator of

We extend the cutting-plane algorithm [7] with the aim of solving the upper-level problem (8) . Let θ LB be a lower bound of the optimal objective value of Problem (8) ; this bound can easily be calculated by solving a continuous relaxation version of Problem (7). Our cutting-plane algorithm starts with the initial feasible region:

where θ is an auxiliary decision variable that corresponds to a lower estimate of f (z). At the tth iteration (t ≥ 1), our algorithm solves the surrogate upper-level problem: minimize

where F t is a feasible region at the tth iteration such that F t ⊆ F 1 . Eq. (13) ensures that the objective value of Problem (14) is bounded below; thus, there exists an optimal solution (z t , θ t ) to Problem (14) . We next solve the dual lower-level problem (10) with z = z t . We thus obtain the function value f (z t ) and its subgradient g(z t ) through Eq. (11) . If f (z t ) ≤ θ t + ε holds with sufficiently small ε ≥ 0, then z t is an ε-optimal solution to Problem (8) , which means that

where f ⋆ is the optimal objective value of Problem (8) . In this case, we terminate the algorithm with the ε-optimal solution z t . Otherwise, to obtain a closer estimate of f (z), we add the constraint (12) withẑ = z t to the feasible region:

Note that this update cuts off the solution (z t , θ t ) because θ t < f (z t ). We then set t ← t + 1 and solve the surrogate upper-level problem (14) again with the updated feasible region (15) . We repeat this procedure until we find an ε-optimal solutionẑ. After termination of the algorithm, we can compute the corresponding portfolio by solving the lower-level problem (9) with z =ẑ.

Our cutting-plane algorithm is summarized in Algorithm 1. Following Kobayashi et al. [35] , we can prove the finite convergence of the algorithm.

Algorithm 1 Cutting-plane algorithm for solving Problem (8) Step 0 (Initialization) Let ε ≥ 0 be a tolerance for optimality. Define the feasible region F 1 as in Eq. (13) . Set t ← 1 and UB 0 ← ∞.

Step 1 (Surrogate Upper-level Problem) Solve Problem (14) . Let (z t , θ t ) be an optimal solution, and set LB t ← θ t .

Step 2 (Dual Lower-level Problem) Solve Problem (10) 

Step 3 (Termination Condition) If UB t − LB t ≤ ε, terminate the algorithm with the ε-optimal solutionẑ.

Step 4 (Cut Generation) Calculate g(z t ) as in Eq. (11) and update the feasible region as in Eq. (15) . Set t ← t + 1 and return to Step 1.

Theorem 3. Algorithm 1 terminates in a finite number of iterations and outputs an ε-optimal solution to Problem (8) .

} be a sequence of solutions generated by Algorithm 1. Suppose that there exists t < T such that z t = z T . Since (z T , θ T ) ∈ F t+1 , it follows from Eq. (15) that

which verifies that z T is an optimal solution to Problem (8) . Since there are at most a finite number of possible solutions z ∈ Z k N , the algorithm terminates with an ε-optimal solution after a finite number of iterations.

Recall that Step 2 of Algorithm 1 solves Problem (10), which is an SDO problem including positive semidefinite constraints (10f) and (10g) on (N + 1) × (N + 1) symmetric matrices. It is clearly difficult to directly solve Problem (10) when N is very large. To remedy this situation, we reduce its problem size by applying the technique of positive semidefinite matrix completion [22, 40] to the lower-level SDO problem (10).

For any vector v := (v n ) ∈ R N and matrix M := (M nn ′ ) ∈ R N ×N , we write the subvector and submatrix corresponding to z, z ′ ∈ {0, 1} N as

Then ω ∈ R N can be replaced with its subvector ω z ∈ R k in the objective (10a) as follows:

The cardinality parameter k is usually much smaller than N. We exploit this problem structure to reduce the size of Problem (10). A reduced version of Problem (10) for z ∈ Z k N is formulated as

where

z ) ∈ R k×L , and λ z ∈ R k are reduced versions of decision variables.

In the next subsection, we verify that the reduced problem (17) is equivalent to the original problem (10) in the sense that an optimal solution to the original problem (10) can be recovered from an optimal solution to the reduced problem (17) . We also prove that f ′ (z) defined by the reduced problem (17) is equal to f (z) defined by the original problem (10).

We first focus on the following lemma.

It then follows that

Proof. See Appendix B.

We next prove that feasible B and β for the original problem (10) can be completed from a feasible solution to the reduced problem (17).

Lemma 5. Let (ω z , B z,z , β z , η, λ z , π) be a feasible solution to Problem (17) for z ∈ Z k N , andβ be defined by Eq. (18) . There then existsB := (B (ℓ) ) ∈ R N ×N ×L such that (B, β) = (B,β) satisfies Eqs. (10c) and (10f).

Proof. For ℓ ∈ [L] such that η (ℓ) = 0, we have β (ℓ) z = 0 from the constraint (17f), thusβ (ℓ) = 0 from the definition (18) . In this case, the solution (B (ℓ) ,β (ℓ) , η (ℓ) ) = (O, 0, 0) satisfies the constraint (10f) and can be left out of Eq. (10c). Therefore, we can assume without loss of generality that η (ℓ) > 0 for all ℓ ∈ [L] in Eqs. (10c) and (10f).

From Eqs. (17e) and (19) , we can see that

where

Then we have

From Eqs. (20) and (21), we can see thatB satisfying Eqs. (10c) and (10f) is obtained by adding appropriate positive semidefinite matrices to B (ℓ) for ℓ ∈ [L].

We are now in a position to prove our main theorem.

Theorem 6. Let (ω z , B z,z , β z , η, λ z , π) be an optimal solution to Problem (17) for z ∈ Z k N , and (ω,β,λ) be defined by Eq. (18) and

λ z := λ z ,

where [v] + := (max{0, v n }) ∈ R N for v := (v n ) ∈ R N . There then exists B such that (ω,B,β, η,λ, π) is an optimal solution to Problem (10). In addition, we have f (z) = f ′ (z) for all z ∈ Z k N . Proof. We assume without loss of generality that

For notational simplicity, we partition vector v ∈ R N and symmetric matrix M ∈ S N in accordance with z as follows:

Suppose that (ω, B, β, η, λ, π) is an optimal solution to Problem (10). Then (ω 1 , B 11 , β 1 , η, λ 1 , π) is a feasible solution to Problem (17) , and its objective value is equal to f (z) due to Eq. (16) . This proves that f (z) ≤ f ′ (z).

Next, we show that f (z) ≥ f ′ (z). Let (ω 1 , B 11 , β 1 , η, λ 1 , π) be an optimal solution to Problem (17) . From definition (22), the constraint (10b) is satisfied by (ω,β, π). The constraint (10d) is satisfied by (β,λ) as

Since the constraint (17g) is satisfied, we can use the following Schur complement property (see, e.g., Section A.5.5 in Boyd and Vandenberghe [15] ):

From Eq. (23), we have

Therefore, the constraint (10g) is satisfied byλ.

From Lemma 5, there existsB satisfying constraints (10c) and (10f). Therefore, (ω,B,β, η,λ, π) is a feasible solution to Problem (10), and its objective value is equal to f ′ (z) due to Eq. (16) . This implies that f (z) ≥ f ′ (z). As a result, we have f (z) = f ′ (z); thus, (ω,B,β, η,λ, π) is an optimal solution to Problem (10).

Remark 1. Note that Eq. (22) defines a minimum-norm solution to Problem (10) . This solution is known to generate strong cutting planes [7] . Remark 2. When κ 1 = 0, unlike in Assumption 1, we can set E F [ξ] =μ and delete the first condition in Eq. (3). In this case, we can also prove the theorem corresponding to Theorem 6.

From Theorem 6, we can revise Step 2 of Algorithm 1 as follows:

Step 2 (Reduced Dual Lower-level Problem) Solve Problem (17) with z = z t . Let (ω z , B z,z , β z , η, λ z , π) be an optimal solution. Calculate f (z t ) = f ′ (z t ) and ω ⋆ (z t ) =ω from Eqs. (18) and (22).

Remark 3. Our cutting-plane algorithm can be extended to a feasible set of portfolios with additional linear constraints:

where C ∈ R J×N and d ∈ R J are given constants (see, e.g., Bertsimas and Cory-Wright [7] and Kobayashi et al. [35] for details).

In this section, we report on the numerical results from evaluating the efficacy of our method for distributionally robust portfolio optimization with the cardinality constraint. We first examine the computational efficiency of our cutting-plane algorithm then demonstrate the out-of-sample investment performance of our portfolio optimization model. All experiments were conducted on a Windows 10 PC with an Intel Xeon E-2274G CPU (4.00 GHz) and 32 GB of memory.

We evaluate the computational efficiency of our cutting-plane algorithm by comparing it with other MISDO algorithms. Table 1 lists the datasets used in our experiments, where N is the number of assets. From the data library on the website of Kenneth R. French [36] , we downloaded two historical datasets (i.e., ind49 and sbm100) of US stock returns. We used monthly data from January 2010 to December 2019 to calculate the sample estimatesμ andΣ of asset returns (2) . From the OR-Library [4, 17] , we downloaded two datasets (i.e., port2 and port5) of the sample estimatesμ andΣ, which were respectively multiplied by 100 and 10,000 to be consistent with the other datasets. From the Kaggle datasets [32] , we downloaded a historical dataset (i.e., s&p500) of US stock returns. To compute the sample estimatesμ andΣ, we used daily data from February 8th, 2013 to February 7th, 2018, in which 32 companies including missing values were omitted from the S&P500 index. [32] As the utility function (4), we used a piecewise-linear approximation of the normalized exponential utility function [31] :

where µ max is the maximum entry of the sample mean vectorμ, and α > 0 is a risk-aversion parameter. As Figure 1 illustrates, we set α = 10 and used three tangent lines (i.e., L = 3) at y ∈ {0, µ max /2, µ max } for piecewise-linear approximation.

We compare the computational performance of the following methods for solving the MISDO problem (7):

µmax ( These methods used Mosek 2 9.2.40 to solve SDO problems. CPA and CPA + were implemented in Python 3.7 with Gurobi Optimizer 3 8.1.11 to solve the surrogate upper-level problem (14) , where the lazy constraint callback was used to add cutting planes during the branch-and-bound procedure. We set ε = 10 −5 as the tolerance for optimality. The computation of each method was terminated if it did not finish within 3,600 s. In these cases, the results obtained within 3,600 s were taken as the final outcome.

The following column labels are used in Table 2 . Note that the best values of "Time" are indicated in bold for each problem instance, and those of "Obj" are also indicated in bold for the port2 and s&p500 datasets. Also, "OM" in the Obj column indicates that the computation did not start due to the out-of-memory condition. Table 2 gives the numerical results of each method for the cardinality parameter k ∈ {5, 10, 15}. We set γ = 10/ √ N for the ℓ 2 -regularization term and (κ 1 , κ 2 ) = (1, 4) for the uncertainty set. Additional results of the computational performance are given in Appendix C.

We first focus on our cutting-plane algorithms (i.e., CPA and CPA + ). Both CPA and CPA + failed to finish solving problem instances for the port2 and s&p500 datasets; however, CPA + solved other problem instances to optimality much faster than did CPA, especially for large N. For the sbm100 dataset with k = 5, although CPA was terminated due to the time limit, CPA + solved the problem instance completely in 1.9 s. These results verify the effectiveness of our matrix-completion-based problem reduction of the dual lower-level SDO problem.

Next, we compare our cutting-plane algorithms with the MISDO solver SCIP. CPA + was always the fastest when it finished computations within the time limit. SCIP returned incorrect optimal objective values for the ind49 dataset with k ∈ {10, 15} due to numerical instabilities. For the port2 dataset, all three methods failed to complete the computations within the time limit, but CPA + found solutions of better quality than did SCIP and CPA for k ∈ {10, 15}.

For the port5 dataset, SCIP did not finish solving even a first continuous relaxation problem within the time limit, thus failed to provide a feasible solution for all k ∈ {5, 10, 15}. CPA did not start computations due to the out-of-memory condition. In contrast, CPA + succeeded in computing solutions with guaranteed optimality for all k ∈ {5, 10, 15}.

For the s&p500 dataset involving 468 investable assets, only CPA + worked normally to find feasible solutions. Optimality gaps attained with CPA + for k ∈ {5, 10, 15} were 21.0%, 19.1%, and 9.9%, respectively, and this gap will be smaller if longer time can be spent on the computation. These results support the potential of CPA + to give good-quality solutions to large problem instances with a limited memory capacity.

The computation time of CPA + tended to be shorter for larger k. For the port5 dataset, CPA + finished the computations in 316.2 s, 84.6 s, and 16.6 s for k ∈ {5, 10, 15}, respectively. The numbers of cutting-planes and nodes were much smaller for k = 15 than for k = 5, which is part of the reason why CPA + is faster with larger k.

We investigate the investment performance of our cardinality-constrained distributionally robust portfolio optimization model in a practical situation.

For the purpose of comparison, we consider the following cardinalityconstrained mean-variance portfolio optimization model:

wherer is a user-defined parameter of the required return level. We compare the out-of-sample investment performance of portfolios determined with the following optimization models:

MV: cardinality-constrained mean-variance optimization model (29) , which was solved using Gurobi Optimizer 8.1.11;

DR: cardinality-constrained distributionally robust optimization model (7), which was solved using our cutting-plane algorithm CPA + .

For the MV model, we set the required return levelr to the first quartile of entries of the sample meanμ. For the DR model, we tuned the uncertainty parameters (κ 1 , κ 2 ) on the basis of the 80% confidence region estimated using the tailored bootstrap method [11] with 100,000 bootstrap samples. We set γ = 10/ √ N for the ℓ 2 -regularization term, and the utility function (4) in the same way as in Section 5.1.1.

We evaluated the out-of-sample investment performance on the basis of a rolling horizon strategy. From Yahoo! Finance [52] , we downloaded weekly data of Japanese stock returns from January 2000 to December 2020, where the top 30 companies were selected (i.e., N = 30) from the Nikkei 225 index according to market capitalization as of December 2020. We solved portfolio 

optimization models with sample estimatesμ andΣ in the first training period (156 weeks). We then calculated weekly returns by applying the obtained portfolios to the subsequent testing period (52 weeks). We repeated this process at intervals of 52 weeks until the end of the entire data period. We evaluate the out-of-sample investment performance on the basis of the cumulative total return:

where R m is the rate of portfolio return in the mth week during the testing period. Figure 2 shows the evolution of (out-of-sample) cumulative total returns produced by each optimization model with the cardinality parameter k ∈ {5, 30}. The highest investment performance was achieved with the DR model with k = 5, which suggests that the out-of-sample investment performance can be improved by the interplay between distributionally robust optimization and the cardinality constraint. In contrast, the investment performance with the MV model was lower with k = 5 than with k = 30 probably because portfolio diversification is prevented by the cardinality constraint.

A notable result is the investment performance in 2020, when the stock market was significantly affected by the spread of COVID-19. In this year, while the cumulative return was not increased with the MV model with k ∈ {5, 30}, it improved with the DR model with k ∈ {5, 30}. This implies that distributionally robust optimization models are helpful in coping with an unexpected situation. These results indicate the practical effectiveness of our cardinality-constrained distributionally robust portfolio optimization model in terms of out-of-sample investment performance.

We considered the moment-based distributionally robust portfolio optimization model with the cardinality constraint. Due to the cardinality constraint, we formulated this model as an MISDO problem, which is very hard to solve exactly when the number of investable assets is large. We reformulated the problem as a bilevel optimization problem and devised a specialized cutting-plane algorithm for solving the upper-level problem. To efficiently generate cutting planes, we applied the positive semidefinite matrix completion [22, 40] to the dual lower-level SDO problem.

The computational results indicate that our cutting-plane algorithm was significantly accelerated by matrix-completion-based problem reduction. Our algorithm was also faster than the state-of-the-art MISDO solver SCIP-SDP when it finished computations within the time limit. Our algorithm succeeded in giving an optimal solution within 3,600 s to problem instances involving 225 assets. It also found a feasible solution of good quality to largesized problem instances involving 468 assets. In addition, the out-of-sample investment performance of our portfolio optimization model was better than that of the conventional mean-variance model.

A future direction of study is to extend our cutting-plane algorithm to other distributionally robust optimization models. For example, various robust portfolio optimization models have been proposed [21] , and discrepancyand kernel-based uncertainty sets have been used [12, 55] as an alternative to the moment-based uncertainty set of probability distributions. In addition, it is promising to integrate the matrix completion technique into general-purpose MISDO algorithms [18, 23, 34] for better computational performance.

[56] S. Zymler, D. Kuhn, and B. Rustem, Distributionally robust joint chance constraints with second-order moment information, Mathematical Programming, 137 (2011), pp. 167-198.

Appendix A. Proof of Theorem 2

The Lagrange function of Problem (9) is expressed as L(x, P , Q, p, q, r, s; α, B, β, η, Λ, λ, ν, π, ρ) Note that Problem (A.2) is an unconstrained convex quadratic optimization problem and its objective function is linear in (P , Q, p, q, r, s). Since Problem (A.2) must be bounded, the Lagrange multipliers are required to satisfy the following conditions:

The following optimality condition should also be satisfied:

According to conditions (A.3)-(A.9), the optimal objective value of Problem (A.2) is calculated as

a (ℓ) β (ℓ) + π1 + ρ.

Since ω ⊤ Z 2 ω = z ⊤ (ω • ω), the Lagrange dual (A.1) of Problem (9) is formulated as maximize ω,α,B,β,η,Λ,λ,ν,π,ρ

We then delete ρ, Λ α, and ν from Problem (A.10) by substituting Eqs. (A.10b), (A.10c), (A.10f), and (A.10h) into other constraints. We now obtain the desired formulation (10) .

To prove the strong duality [2] , we next show that both the primal problem (9) and the dual problem (10) are strictly feasible. Let us set x = z/(1 ⊤ z), q = 0, r > − min ℓ∈[L] {b (ℓ) }, and s > 0 in the primal problem (9) . Then there exists Q satisfying

We next set p = −Qμ, then there exists P satisfying

which implies that the primal problem (9) is strictly feasible. For the dual problem (10), we set λ = 0 and (B, β, η) as

such that constraints (10c)-(10e) are satisfied. We next set ω > ℓ∈[L] a (ℓ) β (ℓ) + π1. As for constraint (10f), we have

because the Schur complement [15] is given by

from Assumption 1. For constraint (10g), Assumption 1 also ensures that 13) which shows that the dual problem (10) is strictly feasible.

For notational simplicity, we assume η (ℓ) > 0 for all ℓ ∈ [L] without loss of generality. As in the proof of Theorem 6, we use the notations (25) and (26) under assumption (24) . We define φ (ℓ) :=β (ℓ) − η (ℓ)μ for ℓ ∈ [L], and due to definition (18), we have

Substituting them into the right side of Eq. (19), we obtain

From constraint (17f), we can use the Schur complement property [15] :

It then follows that

From Assumption 1, we can also use the Schur complement property [15] :

We now obtain the desired result:

Results for different values of uncertainty parameters (κ 1 , κ 2 ) Table C . 3 gives the numerical results of each method for pairs of uncertainty parameters (κ 1 , κ 2 ) ∈ {(0.5, 2), (1, 4) , (2, 8) }. We set k = 10 for the cardinality constraint and γ = 10/ √ N for the ℓ 2 -reguralization term. Table C .3 shows that CPA + consistently outperformed other methods regardless of values of the uncertainty parameters (κ 1 , κ 2 ). For the port2 dataset, all three methods failed to complete computations within the time limit for all (κ 1 , κ 2 ), but CPA + found solutions of better quality than did other methods for (κ 1 , κ 2 ) ∈ {(1, 4), (2, 8) }.

The computation time of CPA + was stable against the change in (κ 1 , κ 2 ). Indeed, for the port5 dataset, CPA + finished the computations in 60.9 s, 84.6 s, and 62.6 s for (κ 1 , κ 2 ) ∈ {(0.5, 2), (1, 4) , (2, 8) }, respectively. In addition, for the s&p500 dataset, optimality gaps given from CPA + were 17.9%, 19.1%, and 20.1% for (κ 1 , κ 2 ) ∈ {(0.5, 2), (1, 4), (2, 8)}, respectively. These results suggest that the computational performance of CPA + is not greatly affected by the uncertainty parameters (κ 1 , κ 2 ).

Results for different values of ℓ 2 -regularization parameter γ Table C . 4 gives the numerical results of each method for the ℓ 2 -regularization parameter γ ∈ {1/ √ N , 10/ √ N, 100/ √ N }. We set k = 10 for the cardinality constraint and (κ 1 , κ 2 ) = (1, 4) for the uncertainty set. Table C .4 shows that CPA + performed very well compared with the other methods. CPA + was always faster than other methods for the ind49, sbm100, and port5 datasets. For the port2 dataset, all three methods failed to solve problem instances to optimality, but CPA + found solutions of better quality than did other methods except for γ = 100/ √ N . For the s&p500 dataset, only CPA + provided feasible solutions without causing an out-of-memory condition.

The computation time of CPA + tended to be shorter for smaller γ. For the port5 dataset, CPA + finished the computations in 66.0 s, 84.6 s, and 417.1 s for γ ∈ {1/ √ N, 10/ √ N, 100/ √ N}, respectively. A similar tendency was also shown from the results for the s&p500 dataset, where optimality gaps obtained by CPA + were 4.2%, 19.1%, and 79.4% for γ ∈ {1/ √ N , 10/ √ N, 100/ √ N }, respectively. 

SCIP: solving constraint integer programs

Interior point methods in semidefinite programming with applications to combinatorial optimization

Martin, LP and SDP branch-and-cut algorithms for the minimum graph bisection problem: a computational comparison

OR-library: Distributing test problems by electronic mail

Robust Optimization

Robust optimization-methodology and applications

A scalable algorithm for sparse portfolio selection

A unified approach to mixed-integer optimization problems with logical constraints

Portfolio construction through mixed-integer programming at Grantham, Mayo, Van Otterloo and Company, Interfaces

Data-driven robust optimization, Mathematical Programming

From predictive to prescriptive analytics

Certifiably optimal sparse inverse covariance estimation

Computational study of a family of mixed-integer quadratic programming problems

Convex Optimization

A new branch and bound method for a discrete truss topology design problem

Heuristics for cardinality constrained portfolio optimisation, Computers & Operations Research

Outer approximation with conic certificates for mixed-integer convex problems

Distributionally robust optimization under moment uncertainty with application to data-driven problems

A generalized approach to portfolio optimization: Improving performance by constraining portfolio norms

Robust portfolios: contributions from operations research and finance

Exploiting sparsity in semidefinite programming via matrix completion I: General framework

A framework for solving mixed-integer semidefinite programs

Worst-case value-at-risk and robust portfolio optimization: A conic programming approach

Distributionally robust optimization and its tractable approximations

Robust portfolio selection problems

Calibration of distributionally robust empirical optimization models

Robust portfolio techniques for mitigating the fragility of CVaR minimization and generalization to coherent risk measures

On the role of norm constraints in portfolio selection

Robust optimization and portfolio selection: The cost of robustness

Theory of financial decision making, Rowman & Littlefield

S&P 500 stock data

The sample average approximation method for stochastic discrete optimization

A branch-and-cut algorithm for solving mixed-integer semidefinite optimization problems

Bilevel cutting-plane algorithm for cardinality-constrained mean-CVaR portfolio optimization

Robust VaR and CVaR optimization under joint ambiguity in distributions, means, and covariances

Twenty years of linear programming based portfolio optimization

Portfolio Selection

Exploiting sparsity in semidefinite programming via matrix completion II: implementation and numerical results

Tractable robust expected utility and risk models for portfolio optimization

Interior-Point Polynomial Algorithms in Convex Programming

From data to decisions: Distributionally robust optimization is optimal

Large-scale portfolio optimization

Robust mean-covariance solutions for stochastic optimization

Distributionally robust optimization: A review

A min-max solution of an inventory problem

Monte carlo sampling methods

Lectures on Stochastic Programming

Best subset selection for eliminating multicollinearity

Robust asset allocation

Yahoo Japan Corporation, Yahoo! Finance

Global optimization of robust truss topology via mixed integer semidefinite programming

Solving 0-1 semidefinite programs for distributionally robust allocation of surgery blocks

Data-driven risk-averse stochastic optimization with wasserstein metric