6. Modelling


In this chapter we will discuss the following issues:

6.1. Linear optimization

A linear optimization problem can be written as

\begin{math}\nonumber{}\begin{array}{lccccl}\nonumber{}\mbox{minimize} &  &  & c^{T}x+c^{f} &  & \\\nonumber{}\mbox{subject to} & l^{c} & \leq{} & Ax & \leq{} & u^{c},\\\nonumber{} & l^{x} & \leq{} & x & \leq{} & u^{x},\end{array}\end{math} (6.1.1)

where

A primal solution (x) is (primal) feasible if it satisfies all constraints in (6.1.1). If (6.1.1) has at least one primal feasible solution, then (6.1.1) is said to be (primal) feasible.

In case (6.1.1) does not have a feasible solution, the problem is said to be (primal) infeasible .

6.1.1. Duality for linear optimization

Corresponding to the primal problem (6.1.1), there is a dual problem

\begin{math}\nonumber{}\begin{array}{lccl}\nonumber{}\mbox{maximize} & (l^{c})^{T}s_{l}^{c}-(u^{c})^{T}s_{u}^{c} &  & \\\nonumber{} & +(l^{x})^{T}s_{l}^{x}-(u^{x})^{T}s_{u}^{x}+c^{f} &  & \\\nonumber{}\mbox{subject to} & A^{T}y+s_{l}^{x}-s_{u}^{x} & = & c,\\\nonumber{} & -y+s_{l}^{c}-s_{u}^{c} & = & 0,\\\nonumber{} & s_{l}^{c},s_{u}^{c},s_{l}^{x},s_{u}^{x}\geq{}0. &  &\end{array}\end{math} (6.1.2)

If a bound in the primal problem is plus or minus infinity, the corresponding dual variable is fixed at 0, and we use the convention that the product of the bound value and the corresponding dual variable is 0. For example

\begin{displaymath}\nonumber{}l_{j}^{x}=-\infty ~\Rightarrow ~(s_{l}^{x})_{j}=0\mbox{ and }l_{j}^{x}\cdot (s_{l}^{x})_{j}=0.\end{displaymath}

This is equivalent to removing variable [[MathCmd 123]] from the dual problem.

A solution

\begin{displaymath}\nonumber{}(y,s_{l}^{c},s_{u}^{c},s_{l}^{x},s_{u}^{x})\end{displaymath}

to the dual problem is feasible if it satisfies all the constraints in (6.1.2). If (6.1.2) has at least one feasible solution, then (6.1.2) is (dual) feasible, otherwise the problem is (dual) infeasible .

We will denote a solution

\begin{displaymath}\nonumber{}(x,y,s_{l}^{c},s_{u}^{c},s_{l}^{x},s_{u}^{x})\end{displaymath}

so that x is a solution to the primal problem (6.1.1), and

\begin{displaymath}\nonumber{}(y,s_{l}^{c},s_{u}^{c},s_{l}^{x},s_{u}^{x})\end{displaymath}

is a solution to the corresponding dual problem (6.1.2). A solution which is both primal and dual feasible is denoted a primal-dual feasible primal-dual solution.

6.1.1.1. A primal-dual feasible solution

Let

\begin{displaymath}\nonumber{}(x^{*},y^{*},(s_{l}^{c})^{*},(s_{u}^{c})^{*},(s_{l}^{x})^{*},(s_{u}^{x})^{*})\end{displaymath}

be a primal-dual feasible solution, and let

\begin{displaymath}\nonumber{}(x^{c})^{*}:=Ax^{*}.\end{displaymath}

For a primal-dual feasible solution we define the optimality gap as the difference between the primal and the dual objective value,

\begin{displaymath}\nonumber{}\begin{array}{c}\nonumber{}c^{T}x^{*}+c^{f}-((l^{c})^{T}s_{l}^{c}-(u^{c})^{T}s_{u}^{c}+(l^{x})^{T}s_{l}^{x}-(u^{x})^{T}s_{u}^{x}+c^{f})\\\nonumber{}=\sum \limits _{{i=1}}^{m}((s_{l}^{c})_{i}^{*}((x_{i}^{c})^{*}-l_{i}^{c})+(s_{u}^{c})_{i}^{*}(u_{i}^{c}-(x_{i}^{c})^{*})+\sum \limits _{{j=1}}^{n}((s_{l}^{x})_{j}^{*}(x_{j}-l_{j}^{x})+(s_{u}^{x})_{j}^{*}(u_{j}^{x}-x_{j}^{*}))\\\nonumber{}\geq{}0\end{array}\end{displaymath}

where the first relation can be obtained by multiplying the dual constraints (6.1.2) by x and [[MathCmd 130]] respectively, and the second relation comes from the fact that each term in each sum is nonnegative. It follows that the primal objective will always be greater than or equal to the dual objective.

We then define the duality gap as the difference between the primal objective value and the dual objective value, i.e.

\begin{displaymath}\nonumber{}c^{T}x^{*}+c^{f}-((l^{c})^{T}s_{l}^{c}-(u^{c})^{T}s_{u}^{c}+(l^{x})^{T}s_{l}^{x}-(u^{x})^{T}s_{u}^{x}+c^{f})\end{displaymath}

Please note that the duality gap will always be nonnegative.

6.1.1.2. An optimal solution

It is well-known that a linear optimization problem has an optimal solution if and only if there exist feasible primal and dual solutions such that the duality gap is zero, or, equivalently, that the complementarity conditions

\begin{displaymath}\nonumber{}\begin{array}{rcll}\nonumber{}(s_{l}^{c})_{i}^{*}((x_{i}^{c})^{*}-l_{i}^{c}) & = & 0, & i=1,\ldots ,m,\\\nonumber{}(s_{u}^{c})_{i}^{*}(u_{i}^{c}-(x_{i}^{c})^{*}) & = & 0, & i=1,\ldots ,m,\\\nonumber{}(s_{l}^{x})_{j}^{*}(x_{j}-l_{j}^{x}) & = & 0, & j=1,\ldots ,n,\\\nonumber{}(s_{u}^{x})_{j}^{*}(u_{j}^{x}-x_{j}^{*}) & = & 0, & j=1,\ldots ,n\end{array}\end{displaymath}

are satisfied.

If (6.1.1) has an optimal solution and MOSEK solves the problem successfully, both the primal and dual solution are reported, including a status indicating the exact state of the solution.

6.1.1.3. Primal infeasible problems

If the problem (6.1.1) is infeasible (has no feasible solution), MOSEK will report a primal certificate of the infeasibility: The dual solution reported is a certificate of infeasibility, and the primal solution is undefined.

A primal certificate (certificate of primal infeasibility) is a feasible solution to the modified dual problem

\begin{math}\nonumber{}\begin{array}{lccl}\nonumber{}\mbox{maximize} & (l^{c})^{T}s_{l}^{c}-(u^{c})^{T}s_{u}^{c}+(l^{x})^{T}s_{l}^{x}-(u^{x})^{T}s_{u}^{x} &  & \\\nonumber{}\mbox{subject to} & A^{T}y+s_{l}^{x}-s_{u}^{x} & = & 0,\\\nonumber{} & -y+s_{l}^{c}-s_{u}^{c} & = & 0,\\\nonumber{} & s_{l}^{c},s_{u}^{c},s_{l}^{x},s_{u}^{x}\geq{}0. &  &\end{array}\end{math} (6.1.3)

so that the objective is strictly positive, i.e. a solution

\begin{displaymath}\nonumber{}(y^{*},(s_{l}^{c})^{*},(s_{u}^{c})^{*},(s_{l}^{x})^{*},(s_{u}^{x})^{*})\end{displaymath}

to (6.1.3) so that

\begin{displaymath}\nonumber{}(l^{c})^{T}(s_{l}^{c})^{*}-(u^{c})^{T}(s_{u}^{c})^{*}+(l^{x})^{T}(s_{l}^{x})^{*}-(u^{x})^{T}(s_{u}^{x})^{*}>0.\end{displaymath}

Such a solution implies that (6.1.3) is unbounded, and that its dual is infeasible.

We note that the dual of (6.1.3) is a problem whose constraints are identical to the constraints of the original primal problem (6.1.1): If the dual of (6.1.3) is infeasible, so is the original primal problem.

6.1.1.4. Dual infeasible problems

If the problem (6.1.2) is infeasible (has no feasible solution), MOSEK will report a dual certificate of the infeasibility: The primal solution reported is a certificate of infeasibility, and the dual solution is undefined.

A certificate of dual infeasibility is a feasible solution to the problem

\begin{math}\nonumber{}\begin{array}{lccccl}\nonumber{}\mbox{minimize} &  &  & c^{T}x &  & \\\nonumber{}\mbox{subject to} &  &  & Ax-x^{c} & = & 0,\\\nonumber{} & \bar{l}^{c} & \leq{} & x^{c} & \leq{} & \bar{u}^{c},\\\nonumber{} & \bar{l}^{x} & \leq{} & x & \leq{} & \bar{u}^{x}\end{array}\end{math} (6.1.4)

where

\begin{displaymath}\nonumber{}\bar{l}_{i}^{c}=\left\lbrace{}\begin{array}{ll}\nonumber{}0, & \mbox{if }l_{i}^{c}>-\infty ,\\\nonumber{}-\infty  & \mbox{otherwise}\end{array}\right.\quad{}\mbox{and}\quad{}\bar{u}_{i}^{c}:=\left\lbrace{}\begin{array}{ll}\nonumber{}0, & \mbox{if }u_{i}^{c}<\infty ,\\\nonumber{}\infty  & \mbox{otherwise}\end{array}\right.\end{displaymath}

and

\begin{displaymath}\nonumber{}\bar{l}_{j}^{x}=\left\lbrace{}\begin{array}{ll}\nonumber{}0, & \mbox{if }l_{j}^{x}>-\infty ,\\\nonumber{}-\infty  & \mbox{otherwise}\end{array}\right.\quad{}\mbox{and}\quad{}\bar{u}_{j}^{x}:=\left\lbrace{}\begin{array}{ll}\nonumber{}0, & \mbox{if }u_{j}^{x}<\infty ,\\\nonumber{}\infty  & \mbox{otherwise}\end{array}\right.\end{displaymath}

so that the objective value [[MathCmd 139]] is negative. Such a solution implies that (6.1.4) is unbounded, and that dual of (6.1.4) is infeasible.

We note that the dual of (6.1.4) is a problem whose constraints are identical to the constraints of the original dual problem (6.1.2): If the dual of (6.1.4) is infeasible, so is the original dual problem.

6.1.2. Primal and dual infeasible case

In case that both the primal problem (6.1.1) and the dual problem (6.1.2) are infeasible, MOSEK will report only one of the two possible certificates — which one is not defined (MOSEK returns the first certificate found).

6.2. Linear network flow problems

Network flow problems are a special class of linear optimization problems which has many applications. The class of network flow problems can be specified as follows. Let [[MathCmd 140]] be a directed network. Associated with every arc [[MathCmd 141]] is a cost [[MathCmd 142]] and a capacity [[MathCmd 143]]. Moreover, associated with each node [[MathCmd 144]] in the network is a lower limit [[MathCmd 145]] and an upper limit [[MathCmd 146]] on the demand(supply) of the node. Now the minimum cost of a network flow problem can be stated as follows

\begin{math}\nonumber{}\begin{array}{lcccccl}\nonumber{}\mbox{minimize} &  &  & \sum \limits _{{(i,j)\in{}\mathcal{A}}}c_{{ij}}x_{{ij}} &  &  & \\\nonumber{}\mbox{subject to} & l^{c}_{{i}} & \leq{} & \sum \limits _{{\lbrace{}j:(i,j)\in{}\mathcal{A}\rbrace{}}}x_{{ij}}-\sum \limits _{{\lbrace{}j:(j,i)\in{}\mathcal{A}\rbrace{}}}x_{{ji}} & \leq{} & u^{c}_{{i}} & \forall i\in{}\mathcal{N},\\\nonumber{} & l^{x}_{{ij}} & \leq{} & x_{{ij}} & \leq{} & u^{x}_{{ij}} & \forall (i,j)\in{}\mathcal{A}.\end{array}\end{math} (6.2.1)

A classical example of a network flow problem is the transportation problem, where the objective is to distribute goods from warehouses to customers at lowest possible total cost, see [8] for a detailed application reference.

It is well-known that problems with network flow structure can be solved efficiently with a specialized version of the simplex method. MOSEK includes a highly tuned network simplex implementation, see Section 7.3.1 for further details on how to invoke the network optimizer.

6.3. Quadratic and quadratically constrained optimization

A convex quadratic optimization problem is an optimization problem of the form

\begin{math}\nonumber{}\begin{array}{lrcccll}\nonumber{}\mbox{minimize} &  &  & \frac{1}{2}x^{T}Q^{o}x+c^{T}x+c^{f} &  &  & \\\nonumber{}\mbox{subject to} & l_{k}^{c} & \leq{} & \frac{1}{2}x^{T}Q^{k}x+\sum \limits _{{j=0}}^{{n-1}}a_{{k,i}}x_{j} & \leq{} & u_{k}^{c}, & k=0,\ldots ,m-1,\\\nonumber{} & l^{x} & \leq{} & x & \leq{} & u^{x}, & j=0,\ldots ,n-1,\end{array}\end{math} (6.3.1)

where the convexity requirement implies that

The convexity requirement is very important and it is strongly recommended that MOSEK is applied to convex problems only.

6.3.1. A general recommendation

Any convex quadratic optimization problem can be reformulated as a conic optimization problem. It is our experience that for the majority of practical applications it is better to cast them as conic problems because

  • the resulting problem is convex by construction, and
  • the conic optimizer is more efficient than the optimizer for general quadratic problems.

See Section 6.4.4.1 for further details.

6.3.2. Reformulating as a separable quadratic problem

The simplest quadratic optimization problem is

\begin{math}\nonumber{}\begin{array}{lccl}\nonumber{}\mbox{minimize} & 1/2x^{T}Qx+c^{T}x &  & \\\nonumber{}\mbox{subject to} & Ax & = & b,\\\nonumber{} & x\geq{}0. &  &\end{array}\end{math} (6.3.2)

The problem (6.3.2) is said to be a separable problem if Q is a diagonal matrix or, in other words, that the quadratic terms in the objective all have this form

\begin{displaymath}\nonumber{}x_{j}^{2}\end{displaymath}

instead of this form

\begin{displaymath}\nonumber{}x_{j}x_{i}.\end{displaymath}

The separable form has the following advantages:

  • It is very easy to check the convexity assumption, and
  • the simpler structure in a separable problem usually makes it easier to solve.

It is well-known that a positive semi-definite matrix Q can always be factorized, i.e. a matrix F exists so that

\begin{math}\nonumber{}Q=F^{T}F.\end{math} (6.3.3)

In many practical applications of quadratic optimization F is known explicitly; for example if Q is a covariance matrix, F would be the set of observations producing it.

Using (6.3.3), the problem (6.3.2) can be reformulated as

\begin{math}\nonumber{}\begin{array}{lccl}\nonumber{}\mbox{minimize} & 1/2y^{T}Iy+c^{T}x &  & \\\nonumber{}\mbox{subject to} & Ax & = & b,\\\nonumber{} & Fx-y & = & 0,\\\nonumber{} & x\geq{}0. &  &\end{array}\end{math} (6.3.4)

The problem (6.3.4) is also a quadratic optimization problem and has more constraints and variables than (6.3.2). However, the problem is separable. Normally, if F has fewer rows than columns, it is worthwhile to reformulate as a separable problem. Indeed consider the extreme case where F has one dense row and hence Q will be dense matrix.

The idea presented above is applicable to quadratic constraints too. Now consider the constraint

\begin{math}\nonumber{}1/2x^{T}(F^{T}F)x\leq{}b\end{math} (6.3.5)

where F is a matrix and b is a scalar. (6.3.5) can be reformulated as

\begin{displaymath}\nonumber{}\begin{array}{rcl}\nonumber{}1/2y^{T}Iy & \leq{} & b,\\\nonumber{}Fx-y & = & 0.\end{array}\end{displaymath}

It should be obvious how to generalize this idea to make any convex quadratic problem separable.

Next, consider the constraint

\begin{displaymath}\nonumber{}1/2x^{T}(D+F^{T}F)x\leq{}b\end{displaymath}

where D is a positive semi-definite matrix, F is a matrix, and b is a scalar. We assume that D has a simple structure, i.e. D is for instance a diagonal or a block diagonal matrix. If this is the case, then it may be worthwhile performing the reformulation

\begin{displaymath}\nonumber{}\begin{array}{rcl}\nonumber{}1/2((x^{T}Dx)+y^{T}Iy) & \leq{} & b,\\\nonumber{}Fx-y & = & 0.\end{array}\end{displaymath}

Now the question may arise: When should a quadratic problem be reformulated to make it separable or near separable? The simplest rule of thumb is that it should be reformulated if the number of non-zeros used to represent the problem decreases when reformulating the problem.

6.4. Conic optimization

Conic optimization can be seen as a generalization of linear optimization. Indeed a conic optimization problem is a linear optimization problem plus a constraint of the form

\begin{displaymath}\nonumber{}x\in{}\mathcal{C}\end{displaymath}

where [[MathCmd 51]] is a convex cone. A complete conic problem has the form

\begin{math}\nonumber{}\begin{array}{lccccc}\nonumber{}\mbox{minimize} &  &  & c^{T}x+c^{f} &  & \\\nonumber{}\mbox{subject to} & l^{c} & \leq{} & Ax & \leq{} & u^{c},\\\nonumber{} & l^{x} & \leq{} & x & \leq{} & u^{x},\\\nonumber{} &  &  & x\in{}\mathcal{C}. &  &\end{array}\end{math} (6.4.1)

The cone [[MathCmd 51]] can be a Cartesian product of p convex cones, i.e.

\begin{displaymath}\nonumber{}\mathcal{C}=\mathcal{C}_{1}\times \cdots \times \mathcal{C}_{p}\end{displaymath}

in which case [[MathCmd 56]] can be written as

\begin{displaymath}\nonumber{}x=(x_{1},\ldots ,x_{p}),~x_{1}\in{}\mathcal{C}_{1},\ldots ,x_{p}\in{}\mathcal{C}_{p}\end{displaymath}

where each [[MathCmd 173]]. Please note that the n-dimensional Euclidean space [[MathCmd 174]] is a cone itself, so simple linear variables are still allowed.

MOSEK supports only a limited number of cones, specifically

\begin{displaymath}\nonumber{}\mathcal{C}=\mathcal{C}_{1}\times \cdot \times \mathcal{C}_{p}\end{displaymath}

where each [[MathCmd 176]] has one of the following forms

Although these cones may seem to provide only limited expressive power they can be used to model a large range of problems as demonstrated in Section 6.4.4.

6.4.1. Duality for conic optimization

The dual problem corresponding to the conic optimization problem (6.4.1) is given by

\begin{math}\nonumber{}\begin{array}{lccccc}\nonumber{}\mbox{maximize} & (l^{c})^{T}s_{l}^{c}-(u^{c})^{T}s_{u}^{c} &  & \\\nonumber{} & +(l^{x})^{T}s_{l}^{x}-(u^{x})^{T}s_{u}^{x}+c^{f} &  & \\\nonumber{}\mbox{subject to} & A^{T}y+s_{l}^{x}-s_{u}^{x}+s_{n}^{x} & = & c,\\\nonumber{} & -y+s_{l}^{c}-s_{u}^{c} & = & 0,\\\nonumber{} & s_{l}^{c},s_{u}^{c},s_{l}^{x},s_{u}^{x} & \geq{} & 0,\\\nonumber{} & s_{n}^{x}\in{}\mathcal{C}^{*} &  &\end{array}\end{math} (6.4.2)

where the dual cone [[MathCmd 182]] is a product of cones

\begin{displaymath}\nonumber{}\mathcal{C}^{*}=\mathcal{C}^{*}_{1}\times \cdots \mathcal{C}^{*}_{p}\end{displaymath}

where each [[MathCmd 184]] is the dual cone of [[MathCmd 176]]. For the cone types MOSEK can handle, the relation between the primal and dual cone is given as follows:

  • [[MathCmd 58]] set:

    \begin{displaymath}\nonumber{}\mathcal{C}_{t}=\left\lbrace{}x\in{}{R}^{{n^{t}}}\right\rbrace{}\quad{}\Leftrightarrow \quad{}\mathcal{C}^{*}_{t}:=\left\lbrace{}s\in{}{R}^{{n^{t}}}:~s=0\right\rbrace{}.\end{displaymath}
  • Quadratic cone:

    \begin{displaymath}\nonumber{}\mathcal{C}_{t}:=\left\lbrace{}x\in{}{R}^{{n^{t}}}:x_{1}\geq{}\sqrt{\sum \limits _{{j=2}}^{{n^{t}}}x_{j}^{2}}\right\rbrace{}\quad{}\Leftrightarrow \quad{}\mathcal{C}^{*}_{t}=\mathcal{C}_{t}.\end{displaymath}
  • Rotated quadratic cone:

    \begin{displaymath}\nonumber{}\mathcal{C}_{t}:=\left\lbrace{}x\in{}{R}^{{n^{t}}}:2x_{1}x_{2}\geq{}\sum \limits _{{j=3}}^{{n^{t}}}x_{j}^{2},~x_{1},x_{2}\geq{}0\right\rbrace{}.\quad{}\Leftrightarrow \quad{}\mathcal{C}^{*}_{t}=\mathcal{C}_{t}.\end{displaymath}

6.4.2. The dual of the dual

The dual problem corresponding to the dual problem is the primal problem.

6.4.3. Infeasibility

In case MOSEK finds a problem to be infeasible it will report a certificate of the infeasibility. This works exactly as for linear problems (see sections 6.1.1.3 and 6.1.1.4).

6.4.4. Examples

This section contains several examples of inequalities and problems that can be cast as conic optimization problems.

6.4.4.1. Quadratic objective and constraints

From Section 6.3.2 we know that any convex quadratic problem can be stated on the form

\begin{math}\nonumber{}\begin{array}{lccl}\nonumber{}\mbox{minimize} & 0.5\left\|Fx\right\|^{2}+c^{T}x, &  & \\\nonumber{}\mbox{subject to} & 0.5\left\|Gx\right\|^{2}+a^{T}x & \leq{} & b,\end{array}\end{math} (6.4.3)

where F and G are matrices and c and a are vectors. For simplicity we assume that there is only one constraint, but it should be obvious how to generalize the methods to an arbitrary number of constraints.

Problem (6.4.3) can be reformulated as

\begin{math}\nonumber{}\begin{array}{lccl}\nonumber{}\mbox{minimize} & 0.5\left\|t\right\|^{2}+c^{T}x, &  & \\\nonumber{}\mbox{subject to} & 0.5\left\|z\right\|^{2}+a^{T}x & \leq{} & b,\\\nonumber{} & Fx-t & = & 0,\\\nonumber{} & Gx-z & = & 0\end{array}\end{math} (6.4.4)

after the introduction of the new variables t and z. It is easy to convert this problem to a conic quadratic optimization problem, i.e.

\begin{math}\nonumber{}\begin{array}{lccll}\nonumber{}\mbox{minimize} & v+c^{T}x, &  &  & \\\nonumber{}\mbox{subject to} & p+a^{T}x & = & b, & \\\nonumber{} & Fx-t & = & 0, & \\\nonumber{} & Gx-z & = & 0, & \\\nonumber{} & w & = & 1, & \\\nonumber{} & q & = & 1, & \\\nonumber{} & \left\|t\right\|^{2} & \leq{} & 2vw, & v,w\geq{}0,\\\nonumber{} & \left\|z\right\|^{2} & \leq{} & 2pq, & p,q\geq{}0.\end{array}\end{math} (6.4.5)

In this case we can model the last two inequalities using rotated quadratic cones.

If we assume that F is a non-singular matrix — for instance a diagonal matrix — then

\begin{displaymath}\nonumber{}x=F^{{-1}}t.\end{displaymath}

and hence we can eliminate x from the problem to obtain:

\begin{math}\nonumber{}\begin{array}{lccll}\nonumber{}\mbox{minimize} & v+c^{T}F^{{-1}}t, &  &  & \\\nonumber{}\mbox{subject to} & p+a^{T}F^{{-1}}t & = & b, & \\\nonumber{} & VF^{{-1}}t-z & = & 0, & \\\nonumber{} & w & = & 1, & \\\nonumber{} & q & = & 1, & \\\nonumber{} & \left\|t\right\|^{2} & \leq{} & 2vw, & v,w\geq{}0,\\\nonumber{} & \left\|z\right\|^{2} & \leq{} & 2pq, & p,q\geq{}0.\end{array}\end{math} (6.4.6)

In most cases MOSEK will perform this reduction automatically during the presolve phase before the optimization is performed.

6.4.4.2. Minimizing a sum of norms

The next example is the problem of minimizing a sum of norms i.e. the problem

\begin{math}\nonumber{}\begin{array}{lccl}\nonumber{}\mbox{minimize} & \sum \limits _{{i=1}}^{{k}}\left\|x^{i}\right\| &  & \\\nonumber{}\mbox{subject to} & Ax & = & b,\end{array}\end{math} (6.4.7)

where

\begin{displaymath}\nonumber{}x:=\left[\begin{array}{c}\nonumber{}x^{1}\\\nonumber{}\vdots \\\nonumber{}x^{k}\end{array}\right].\end{displaymath}

This problem is equivalent to

\begin{math}\nonumber{}\begin{array}{lccll}\nonumber{}\mbox{minimize} & \sum \limits _{{i=1}}^{{k}}z_{i} &  &  & \\\nonumber{}\mbox{subject to} & Ax & = & b, & \\\nonumber{} & \left\|x^{i}\right\| & \leq{} & z_{i}, & i=1,\ldots ,k,\end{array}\end{math} (6.4.8)

which in turn is equivalent to

\begin{math}\nonumber{}\begin{array}{lccl}\nonumber{}\mbox{minimize} & \sum \limits _{{i=1}}^{{k}}z_{i} &  & \\\nonumber{}\mbox{subject to} & Ax & = & b,\\\nonumber{} & (z_{i},x^{i})\in{}\mathcal{C}_{i}, &  & i=1,\ldots ,k\end{array}\end{math} (6.4.9)

where all [[MathCmd 199]] are of the quadratic type, i.e.

\begin{displaymath}\nonumber{}\mathcal{C}_{i}:=\left\lbrace{}(z_{i},x^{i}):~z_{i}\geq{}\left\|x^{i}\right\|\right\rbrace{}.\end{displaymath}

The dual problem corresponding to (6.4.9) is

\begin{math}\nonumber{}\begin{array}{lccll}\nonumber{}\mbox{maximize} & b^{T}y &  &  & \\\nonumber{}\mbox{subject to} & A^{T}y+s & = & c, & \\\nonumber{} & t_{i} & = & 1, & i=1,\ldots ,k,\\\nonumber{} & (t_{i},s^{i})\in{}\mathcal{C}_{i}, &  &  & i=1,\ldots ,k\end{array}\end{math} (6.4.10)

where

\begin{displaymath}\nonumber{}s:=\left[\begin{array}{c}\nonumber{}s^{1}\\\nonumber{}\vdots \\\nonumber{}s^{k}\end{array}\right].\end{displaymath}

This problem is equivalent to

\begin{math}\nonumber{}\begin{array}{lccll}\nonumber{}\mbox{maximize} & b^{T}y &  &  & \\\nonumber{}\mbox{subject to} & A^{T}y+s & = & c, & \\\nonumber{} & \left\|s^{i}\right\|_{2}^{2} & \leq{} & 1, & i=1,\ldots ,k.\end{array}\end{math} (6.4.11)

Please note that the dual problem can be reduced to an “ordinary” convex quadratically constrained optimization problem in this case due to the special structure of the primal problem. In some cases it turns out that it is much better to solve the dual problem (6.4.10) rather than the primal problem (6.4.9).

6.4.4.3. Modelling polynomial terms using conic optimization

Generally an arbitrary polynomial term of the form

\begin{displaymath}\nonumber{}fx^{g}\end{displaymath}

cannot be represented with conic quadratic constraints, however in the following we will demonstrate some special cases where it is possible.

A particular simple polynomial term is the reciprocal, i.e.

\begin{displaymath}\nonumber{}\frac{1}{x}.\end{displaymath}

Now, a constraint of the form

\begin{displaymath}\nonumber{}\frac{1}{x}\leq{}y\end{displaymath}

where it is required that x>0 is equivalent to

\begin{displaymath}\nonumber{}1\leq{}xy\mbox{ and }x>0\end{displaymath}

which in turn is equivalent to

\begin{displaymath}\nonumber{}\begin{array}{rcl}\nonumber{}z & = & \sqrt{2},\\\nonumber{}z^{2} & \leq{} & 2xy.\end{array}\end{displaymath}

The last formulation is a conic constraint plus a simple linear equality.

For example, consider the problem

\begin{displaymath}\nonumber{}\begin{array}{lccl}\nonumber{}\mbox{minimize} & c^{T}x &  & \\\nonumber{}\mbox{subject to} & \sum \limits _{{j=1}}^{n}\frac{f_{j}}{x_{j}} & \leq{} & b,\\\nonumber{} & x\geq{}0, &  &\end{array}\end{displaymath}

where it is assumed that [[MathCmd 210]] and b>0. This problem is equivalent to

\begin{math}\nonumber{}\begin{array}{lccll}\nonumber{}\mbox{minimize} & c^{T}x &  &  & \\\nonumber{}\mbox{subject to} & \sum \limits _{{j=1}}^{n}z_{j} & = & b, & \\\nonumber{} & v_{j} & = & \sqrt{2}, & j=1,\ldots ,n,\\\nonumber{} & v_{j}^{2} & \leq{} & 2z_{j}x_{j}, & j=1,\ldots ,n,\\\nonumber{} & x,z\geq{}0, &  &  &\end{array}\end{math} (6.4.12)

because

\begin{displaymath}\nonumber{}v_{j}^{2}=2\leq{}2z_{j}x_{j}\end{displaymath}

implies that

\begin{displaymath}\nonumber{}\frac{1}{x_{j}}\leq{}z_{j}\mbox{ and }\sum \limits _{{j=1}}^{n}\frac{f_{j}}{x_{j}}\leq{}\sum \limits _{{j=1}}^{n}f_{j}z_{j}=b.\end{displaymath}

The problem (6.4.12) is a conic quadratic optimization problem having n 3 dimensional rotated quadratic cones.

The next example is the constraint

\begin{displaymath}\nonumber{}\begin{array}{rcl}\nonumber{}\sqrt{x} & \geq{} & |t|,\\\nonumber{}x & \geq{} & 0,\end{array}\end{displaymath}

where both t and x are variables. This set is identical to the set

\begin{math}\nonumber{}\begin{array}{rcl}\nonumber{}t^{2} & \leq{} & 2xz,\\\nonumber{}z & = & 0.5,\\\nonumber{}x,z, & \geq{} & 0.\end{array}\end{math} (6.4.13)

Occasionally when modelling the market impact term in portfolio optimization, the polynomial term [[MathCmd 216]] occurs. Therefore, consider the set defined by the inequalities

\begin{math}\nonumber{}\begin{array}{rcl}\nonumber{}x^{{1.5}} & \leq{} & t,\\\nonumber{}0 & \leq{} & x.\end{array}\end{math} (6.4.14)

We will exploit that [[MathCmd 218]] . First define the set

\begin{math}\nonumber{}\begin{array}{rcl}\nonumber{}x^{2} & \leq{} & 2st,\\\nonumber{}s,t & \geq{} & 0.\end{array}\end{math} (6.4.15)

Now, if we can make sure that

\begin{displaymath}\nonumber{}2s\leq{}\sqrt{x},\end{displaymath}

then we have the desired result since this implies that

\begin{displaymath}\nonumber{}x^{{1.5}}=\frac{x^{2}}{\sqrt{x}}\leq{}\frac{x^{2}}{2 s}\leq{}t.\end{displaymath}

Please note that s can be chosen freely and that [[MathCmd 222]] is a valid choice.

Let

\begin{math}\nonumber{}\begin{array}{rcl}\nonumber{}x^{2} & \leq{} & 2st,\\\nonumber{}w^{2} & \leq{} & 2vr,\\\nonumber{}x & = & v,\\\nonumber{}s & = & w,\\\nonumber{}r & = & \frac{1}{8},\\\nonumber{}s,t,v,r & \geq{} & 0,\end{array}\end{math} (6.4.16)

then

\begin{displaymath}\nonumber{}\begin{array}{rcl}\nonumber{}s^{2} & = & w^{2}\\\nonumber{} & \leq{} & 2vr\\\nonumber{} & \leq{} & \frac{v}{4}\\\nonumber{} & = & \frac{x}{4}.\end{array}\end{displaymath}

Moreover,

\begin{displaymath}\nonumber{}\begin{array}{rcl}\nonumber{}x^{2} & \leq{} & 2st,\\\nonumber{} & \leq{} & 2\sqrt{\frac{x}{4}}t\end{array}\end{displaymath}

leading to the conclusion

\begin{displaymath}\nonumber{}x^{{1.5}}\leq{}t.\end{displaymath}

(6.4.16) is a conic reformulation which is equivalent to (6.4.14). Please note that the x0 constraint does not appear explicitly in (6.4.15) and (6.4.16), but implicitly since x=v0.

Finally, it should be mentioned that any polynomial term of form [[MathCmd 227]] where g is a positive rational number can be represented using conic quadratic constraints [11, pp. 12-13]

6.4.4.4. Further reading

If you want to know more about what can be modelled as a conic optimization problem we recommend the references [18, 5, 11].

6.4.5. Potential pitfalls in conic optimization

While a linear optimization problem either has a bounded optimal solution or is infeasible, the conic case is not as simple as that.

6.4.5.1. Non-attainment in the primal problem

Consider the example

\begin{math}\nonumber{}\begin{array}{lccl}\nonumber{}\mbox{minimize} & z &  & \\\nonumber{}\mbox{subject to} & 2yz & \geq{} & x^{2},\\\nonumber{} & x & = & \sqrt{2},\\\nonumber{} & y,z & \geq{} & 0.\end{array}\end{math} (6.4.17)

which corresponds to the problem

\begin{math}\nonumber{}\begin{array}{lccl}\nonumber{}\mbox{minimize} & \frac{1}{y} &  & \\\nonumber{}\mbox{subject to} & y & \geq{} & 0.\end{array}\end{math} (6.4.18)

Clearly, the optimal objective value is zero but it is never attained because implicitly we assume that the optimal y should be finite.

6.4.5.2. Non-attainment in the dual problem

Next, consider the example

\begin{math}\nonumber{}\begin{array}{lccl}\nonumber{}\mbox{minimize} & x_{4} &  & \\\nonumber{}\mbox{subject to} & x_{3}+x_{4} & = & 1,\\\nonumber{} & x_{1} & = & 0,\\\nonumber{} & x_{2} & = & 1,\\\nonumber{} & 2x_{1}x_{2} & \geq{} & x_{3}^{2},\\\nonumber{} & x_{1}x_{2} & \geq{} & 0.\end{array}\end{math} (6.4.19)

which has the optimal solution

\begin{displaymath}\nonumber{}x_{1}^{*}=0,~x_{2}^{*}=1,~x_{3}^{*}=0\mbox{ and }x_{4}^{*}=1\end{displaymath}

implying that the optimal primal objective value is 1.

Now, the dual problem corresponding to (6.4.19) is

\begin{math}\nonumber{}\begin{array}{lccl}\nonumber{}\mbox{maximize} & y_{1}+y_{3} &  & \\\nonumber{}\mbox{subject to} & y_{2}+s_{1} & = & 0,\\\nonumber{} & y_{3}+s_{2} & = & 0,\\\nonumber{} & y_{1}+s_{3} & = & 0,\\\nonumber{} & y_{1} & = & 1,\\\nonumber{} & 2s_{1}s_{2} & \geq{} & s_{3}^{2},\\\nonumber{} & s_{1}s_{2} & \geq{} & 0.\end{array}\end{math} (6.4.20)

Therefore,

\begin{displaymath}\nonumber{}y_{1}^{*}=1\end{displaymath}

and

\begin{displaymath}\nonumber{}s_{3}^{*}=-1.\end{displaymath}

This implies that

\begin{displaymath}\nonumber{}2s_{1}^{*}s_{2}^{*}\geq{}(s_{3}^{*})^{2}=1\end{displaymath}

and hence [[MathCmd 236]]. Given this fact we can conclude that

\begin{displaymath}\nonumber{}\begin{array}{rcl}\nonumber{}y_{1}^{*}+y_{3}^{*} & = & 1-s_{2}^{*}\\\nonumber{} & < & 1\end{array}\end{displaymath}

implying that the optimal dual objective value is 1, however this is never attained. Hence, there no primal and dual bounded optimal solution that has zero duality gap exists. Of course it is possible to find a primal and dual feasible solution such that the duality gap is close to zero, however, [[MathCmd 238]] will be very large (unless a large duality gap is allowed). This is likely to make the problem (6.4.19) hard to solve.

An inspection of problem (6.4.19) reveals the constraint [[MathCmd 239]], which implies that [[MathCmd 240]]. If we either add the redundant constraint

\begin{displaymath}\nonumber{}x_{3}=0\end{displaymath}

to the problem (6.4.19) or eliminate [[MathCmd 242]] and [[MathCmd 73]] from the problem it becomes easy to solve.

6.5. Recommendations

Often an optimization problem can be formulated in several different ways, and the exact formulation used may have a significant impact on the solution time and the quality of the solution. In some cases the difference between a “good” and a “bad” formulation means the ability to solve the problem or not.

Below is a list of several issues that you should be aware of when developing a good formulation.

  1. Sparsity is very important. The constraint matrix A is assumed to be a sparse matrix, where sparse means that it contains many zeros (typically less than 10% non-zeros). Normally, when A is sparser, less memory is required to store the problem and it can be solved faster.
  2. Avoid large bounds as these can introduce all sorts of numerical problems. Assume that a variable [[MathCmd 244]] has the bounds

    \begin{displaymath}\nonumber{}0.0\leq{}x_{j}\leq{}1.0e16.\end{displaymath}

    The number 1.0e16 is large and it is very likely that the constraint [[MathCmd 246]] is non-binding at optimum, and therefore that the bound 1.0e16 will not cause problems. Unfortunately, this is a naïve assumption because the bound 1.0e16 may actually affect the presolve, the scaling, the computation of the dual objective value, etc. In this case the constraint [[MathCmd 247]] is likely to be sufficient, i.e. 1.0e16 is just a way of representing infinity.

  3. Avoid large penalty terms in the objective, i.e. do not have large terms in the linear part of the objective function. They will most likely cause numerical problems.
  4. On a computer all computations are performed in finite precision, which implies that

    \begin{displaymath}\nonumber{}1=1+\varepsilon\end{displaymath}

    where [[MathCmd 249]] is about [[MathCmd 250]]. This means that the results of all computations are truncated leading to the introduction of rounding errors. The upshot is that very small numbers and very large numbers should be avoided, e.g. it is recommended that all elements in A are either zero or belong to the interval [[MathCmd 251]]. The same holds for the bounds and the linear objective.

  5. Decreasing the number of variables or constraints does not necessarily make it easier to solve a problem. In certain cases, i.e. in nonlinear optimization, it might be a good idea to introduce more constraints and variables if it makes the model separable. Also a big but sparse problem might be advantageous compared to a smaller but denser problem.
  6. Try to avoid linearly dependent rows among the linear constraints. Network flow problems and multi-commodity network flow problems, for example, often contain one or more linearly dependent rows.
  7. Finally, it is recommended to consult some of the papers about preprocessing to get some ideas about efficient formulations. See e.g. [10, 6, 24, 22].

6.5.1. Avoid nearly infeasible models

Consider the linear optimization problem

\begin{math}\nonumber{}\begin{array}{lccl}\nonumber{}\mbox{minimize} &  &  & \\\nonumber{}\mbox{subject to} & x+y & \leq{} & 10^{{-10}}+\alpha ,\\\nonumber{} & 1.0e4x+2.0e4y & \geq{} & 10^{{-6}},\\\nonumber{} & x,y\geq{}0. &  &\end{array}\end{math} (6.5.1)

Clearly, the problem is feasible for [[MathCmd 253]]. However, for [[MathCmd 254]] the problem is infeasible. This implies that an insignificant change in the right side of the constraints makes the problem status switch from feasible to infeasible. Such a model should be avoided.

6.6. Examples continued

6.6.1. The absolute value

Assume we have a constraint for the form

\begin{math}\nonumber{}|f^{T}x+g|\leq{}b\end{math} (6.6.1)

where [[MathCmd 114]] is a vector of variables, and [[MathCmd 257]] and [[MathCmd 258]] are constants.

It is easy to verify that the constraint (6.6.1) is equivalent to

\begin{math}\nonumber{}-b\leq{}f^{T}x+g-t\leq{}b\end{math} (6.6.2)

which is a set of ordinary linear inequality constraints.

Please note that equalities involving and absolute value such as

\begin{displaymath}\nonumber{}|x|=1\end{displaymath}

cannot be formulated as a linear or even a convex optimization problem. It requires integer optimization.

6.6.2. The Markowitz portfolio model

In this section we will show how to model several versions of the Markowitz portfolio model using conic optimization.

The Markowitz portfolio model deals with the problem of selecting a portfolio of assets i.e. stocks, bonds, etc. The goal is to find a portfolio such that for a given return the risk is minimized. The assumptions are:

  • A portfolio can consist of n traded assets numbered [[MathCmd 261]] held over a period of time.
  • [[MathCmd 262]] is the initial holding of asset j where [[MathCmd 263]].
  • [[MathCmd 264]] is the return on asset j and is assumed to be a random variable. r has known mean [[MathCmd 265]] and covariance [[MathCmd 266]].

The variable [[MathCmd 244]] denotes the amount of asset j traded in the given period of time and has the following meaning:

  • If [[MathCmd 268]], then the amount of asset j is increased (by purchasing).
  • If [[MathCmd 269]], then the amount of asset j is decreased (by selling).

The model deals with two central quantities:

  • Expected return:

    \begin{displaymath}\nonumber{}E[r^{T}(w^{0}+x)]=\bar{r}^{T}(w^{0}+x).\end{displaymath}
  • Variance (Risk):

    \begin{displaymath}\nonumber{}V[r^{T}(w^{0}+x)]=(w^{0}+x)^{T}\Sigma (w^{0}+x).\end{displaymath}

By definition [[MathCmd 266]] is positive semi-definite and

\begin{displaymath}\nonumber{}\begin{array}{rcl}\nonumber{}\mbox{Std. dev.} & = & \left\|\Sigma ^{{\frac{1}{2}}}(w^{0}+x)\right\|\\\nonumber{} & = & \left\|L^{T}(w^{0}+x)\right\|\end{array}\end{displaymath}

where L is any matrix such that

\begin{displaymath}\nonumber{}\Sigma =LL^{T}\end{displaymath}

A low rank of [[MathCmd 266]] is advantageous from a computational point of view. A valid L can always be computed as the Cholesky factorization of [[MathCmd 266]].

6.6.2.1. Minimizing variance for a given return

In our first model we want to minimize the variance while selecting a portfolio with a specified expected target return t. Additionally the portfolio must satisfy the budget (self-financing) constraint asserting that the total amount of assets sold must equal the total amount of assets purchased. This is expressed in the model

\begin{math}\nonumber{}\begin{array}{lccl}\nonumber{}\mbox{minimize} & V[r^{T}(w^{0}+x)] &  & \\\nonumber{}\mbox{subject to} & E[r^{T}(w^{0}+x)] & = & t,\\\nonumber{} & e^{T}x & = & 0,\end{array}\end{math} (6.6.3)

where [[MathCmd 278]]. Using the definitions above this may be formulated as a quadratic optimization problem:

\begin{math}\nonumber{}\begin{array}{lccl}\nonumber{}\mbox{minimize} & (w^{0}+x)^{T}\Sigma (w^{0}+x) &  & \\\nonumber{}\mbox{subject to} & \bar{r}^{T}(w^{0}+x) & = & t,\\\nonumber{} & e^{T}x & = & 0,\end{array}\end{math} (6.6.4)

6.6.2.2. Conic quadratic reformulation.

An equivalent conic quadratic reformulation is given by:

\begin{math}\nonumber{}\begin{array}{lcccl}\nonumber{}\mbox{minimize} & f &  & \\\nonumber{}\mbox{subject to} & \Sigma ^{{\frac{1}{2}}}(w^{0}+x)-g & = & 0,\\\nonumber{} & \bar{r}^{T}(w^{0}+x) & = & t,\\\nonumber{} & e^{T}x & = & 0,\\\nonumber{} & f\geq{}\left\|g\right\|. &  &\end{array}\end{math} (6.6.5)

Here we minimize the standard deviation instead of the variance. Please note that [[MathCmd 281]] can be replaced by any matrix L where [[MathCmd 282]]. A low rank L is computationally advantageous.

6.6.2.3. Transaction costs with market impact term

We will now expand our model to include transaction costs as a fraction of the traded volume. [14, pp. 445-475] argues that transaction costs can be modelled as follows

\begin{math}\nonumber{}\mbox{commission}+\frac{\mbox{bid}}{\mbox{ask}}-\mbox{spread}+\theta \sqrt{\frac{\mbox{trade volume}}{\mbox{daily volume}}},\end{math} (6.6.6)

and that these are important to incorporate into the model.

In the following we deal with the last of these terms denoted the market impact term. If you sell (buy) a lot of assets the price is likely to go down (up). This can be captured in the market impact term

\begin{displaymath}\nonumber{}\theta \sqrt{\frac{\mbox{trade volume}}{\mbox{daily volume}}}\approx{}m_{j}\sqrt{|x_{j}|}.\end{displaymath}

The [[MathCmd 285]] and “daily volume” have to be estimated in some way, i.e.

\begin{displaymath}\nonumber{}m_{j}=\frac{\theta }{\sqrt{\mbox{daily volume}}}\end{displaymath}

has to be estimated. The market impact term gives the cost as a fraction of daily traded volume ([[MathCmd 287]]). Therefore, the total cost when trading an amount [[MathCmd 244]] of asset j is given by

\begin{displaymath}\nonumber{}|x_{j}|(m_{j}|x_{j}|^{{\frac{1}{2}}}).\end{displaymath}

This leads us to the model:

\begin{math}\nonumber{}\begin{array}{lcccl}\nonumber{}\mbox{minimize} & f &  & \\\nonumber{}\mbox{subject to} & \Sigma ^{{\frac{1}{2}}}(w^{0}+x)-g & = & 0,\\\nonumber{} & \bar{r}^{T}(w^{0}+x) & = & t,\\\nonumber{} & e^{T}x+e^{T}y & = & 0,\\\nonumber{} & |x_{j}|(m_{j}|x_{j}|^{{\frac{1}{2}}}) & \leq{} & y_{j},\\\nonumber{} & f\geq{}\left\|g\right\|. &  &\end{array}\end{math} (6.6.7)

Now, defining the variable transformation

\begin{displaymath}\nonumber{}y_{j}=m_{j}\bar{y}_{j}\end{displaymath}

we obtain

\begin{math}\nonumber{}\begin{array}{lcccl}\nonumber{}\mbox{minimize} & f &  & \\\nonumber{}\mbox{subject to} & \Sigma ^{{\frac{1}{2}}}(w^{0}+x)-g & = & 0,\\\nonumber{} & \bar{r}^{T}(w^{0}+x) & = & t,\\\nonumber{} & e^{T}x+m^{T}\bar{y} & = & 0,\\\nonumber{} & |x_{j}|^{{3/2}} & \leq{} & \bar{y}_{j},\\\nonumber{} & f\geq{}\left\|g\right\|. &  &\end{array}\end{math} (6.6.8)

As shown in Section 6.4.4.3 the set

\begin{displaymath}\nonumber{}|x_{j}|^{{3/2}}\leq{}\bar{y}_{j}\end{displaymath}

can be modelled by

\begin{math}\nonumber{}\begin{array}{rcl}\nonumber{}x_{j} & \leq{} & z_{j},\\\nonumber{}-x_{j} & \leq{} & z_{j},\\\nonumber{}z_{j}^{2} & \leq{} & 2s_{j}\bar{y}_{j},\\\nonumber{}u_{j}^{2} & \leq{} & 2v_{j}q_{j},\\\nonumber{}z_{j} & = & v_{j},\\\nonumber{}s_{j} & = & u_{j},\\\nonumber{}q_{j} & = & \frac{1}{8},\\\nonumber{}q_{j},s_{j},\bar{y}_{j},v_{j},q_{j} & \geq{} & 0.\end{array}\end{math} (6.6.9)

6.6.2.4. Further reading

For further reading please see the reader to [13] in particular, and [3] and [14], which also contain relevant material.

Mon Sep 14 15:53:06 2009