10. The optimizers for continuous problems


The most essential part of MOSEK is the optimizers. Each optimizer is designed to solve a particular class of problems i.e. linear, conic, or general nonlinear problems. The purpose of the present chapter is to discuss which optimizers are available for the continuous problem classes and how the performance of an optimizer can be tuned, if needed.

This chapter deals with the optimizers for continuous problems with no integer variables.

10.1. How an optimizer works

When the optimizer is called, it roughly performs the following steps:

Presolve:

Preprocessing to reduce the size of the problem.

Dualizer:

Choosing whether to solve the primal or the dual form of the problem.

Scaling:

Scaling the problem for better numerical stability.

Optimize:

Solving the actual optimization.

The first three preprocessing steps are transparent to the user, but useful to know about for tuning purposes. In general, the purpose of the preprocessing steps is to make the actual optimization more efficient and robust.

10.1.1. Presolve

Before an optimizer actually performs the optimization the problem is normally preprocessed using the so-called presolve. The purpose of the presolve is to

  • remove redundant constraints,
  • eliminate fixed variables,
  • remove linear dependencies,
  • substitute out free variables, and
  • reduce the size of the optimization problem in general.

After the presolved problem has been optimized the solution is automatically postsolved so that the returned solution is valid for the original problem. Hence, the presolve is completely transparent. For further details about the presolve phase, please see [11, 7].

It is possible to fine-tune the behavior of the presolve or to turn it off entirely. If the presolve is known to be unable to reduce the size of a problem significantly, then turning off the presolve is beneficial. This is done by setting the parameter MSK_IPAR_PRESOLVE_USE to MSK_PRESOLVE_MODE_OFF.

The two most time-consuming steps of the presolve are usually

  • the eliminator, and
  • the linear dependency check.

Therefore, in some cases it is worthwhile to disable one or both of these.

The purpose of the eliminator is to eliminate free and implied free variables from the problem using substitution. For instance, given the constraints

\begin{displaymath}\nonumber{}\begin{array}{rcl}\nonumber{}y & = & \sum _{j}x_{j},\\\nonumber{}y,x & \geq{} & 0,\end{array}\end{displaymath}

y is an implied free variable that can be substituted out of the problem, if deemed worthwhile. By implied free variable is meant that the constraint y0 is redundant and hence y can be treated as a free variable.

For large scale problems the eliminator usually removes many constraints and variables. However, in some cases few or no eliminations can be performed and moreover, the eliminator may consume a lot of memory and time. If this is the case it is worthwhile to disable the eliminator by setting the parameter MSK_IPAR_PRESOLVE_ELIMINATOR_USE to MSK_OFF.

The purpose of the linear dependency check is to remove linear dependencies among the linear equalities. For instance, the three linear equalities

\begin{displaymath}\nonumber{}\begin{array}{rcl}\nonumber{}x_{1}+x_{2}+x_{3} & = & 1,\\\nonumber{}x_{1}+0.5x_{2} & = & 0.5,\\\nonumber{}0.5x_{2}+x_{3} & = & 0.5\end{array}\end{displaymath}

contain exactly one linear dependency. This implies that one of the constraints can be dropped without changing the set of feasible solutions, i.e. one of the constraints is redundant. Removing linear dependencies is in general a good idea since it reduces the size of the problem. Moreover, the linear dependencies are likely to introduce numerical problems in the optimization phase, and therefore it is strongly recommended to build models without linear dependencies. In case the linear dependencies are removed at the modelling stage, the linear dependency check can safely be disabled by setting the parameter MSK_IPAR_PRESOLVE_LINDEP_USE to MSK_OFF.

10.1.2. Dualizer

It is well-known that all linear, conic, and convex optimization problems have an associated dual problem. Moreover, even if the dual instead of the primal problem is solved, it is possible to recover the solution to the original primal problem.

In general, it is very hard to say whether it is easier to solve the primal or the dual problem but MOSEK has some heuristics for deciding which of the two problems to solve. Which form of the problem (primal or dual) that is solved is displayed in the MOSEK log. Please note that the dualizer is transparent, and all solution values returned by the optimizer refer to the original primal problem.

The dualizer can be controlled manually by setting the parameters:

Finally, please note that currently only linear problems may be dualized.

10.1.3. Scaling

Problems containing data with large and/or small coefficients, say 1.0e+9 or 1.0e-7, are often hard to solve. Significant digits might be truncated in calculations with finite precision, which can result in the optimizer relying on inaccurate calculations. Since computers work in finite precision, extreme coefficients should be avoided. In general, data around the same “order of magnitude” is preferred, and we will refer to a problem, satisfying this loose property, as being well-scaled. If the problem is not well scaled, MOSEK will try to scale (multiply) constraints and variables by suitable constants. MOSEK solves the scaled problem to improve the numerical properties.

The scaling process is transparent, i.e. the solution to the original problem is reported. It is important to be aware that the optimizer terminates when the termination criterion is met on the scaled problem, therefore significant primal or dual infeasibilities may occur after unscaling for badly scaled problems. The best solution to this problem is to reformulate it, making it better scaled.

By default MOSEK heuristically chooses a suitable scaling. The scaling for interior-point and simplex optimizers can be controlled with the parameters MSK_IPAR_INTPNT_SCALING and MSK_IPAR_SIM_SCALING respectively.

10.1.4. Using multiple CPU's

The interior-point optimizers in MOSEK have been parallelized. This means that if you solve linear, quadratic, conic, or general convex optimization problem using the interior-point optimizer, you can take advantage of multiple CPU's.

By default MOSEK uses one thread to solve the problem, but the number of threads (and thereby CPUs) employed can be changed by setting the parameter MSK_IPAR_INTPNT_NUM_THREADS This should never exceed the number of CPU's on the machine.

The speed-up obtained when using multiple CPUs is highly problem and hardware dependent, and consequently, it is advisable to compare single threaded and multi threaded performance for the given problem type to determine the optimal settings.

For small problems, using multiple threads will probably not be worthwhile.

10.2. Linear optimization

10.2.1. Optimizer selection

For linear optimization problems two different types of optimizers are available. The default for linear problems is an interior-point optimizer, however, as an alternative the simplex optimizer can be employed.

The curious reader can consult [27] for a discussion about interior-point and simplex algorithms.

10.2.2. The interior-point optimizer

The MOSEK interior-point optimizer is an implementation of the homogeneous and self-dual algorithm. For a detailed description of the algorithm, please see [5].

10.2.2.1. Basis identification

It is well-known that an interior-point optimizer does not return an optimal basic solution unless the problem has a unique primal and dual optimal solution. Therefore, the interior-point optimizer has an optional post-processing step that computes an optimal basic solution starting from the optimal interior-point solution. More information about the basis identification procedure is found in [8].

Please note that a basic solution is often more accurate than an interior-point solution.

By default MOSEK performs a basis identification, however, if a basic solution is not needed, the basis identification procedure can be turned off. The parameters

controls when basis identification is performed.

10.2.2.2. Interior-point termination criterion

The parameters in Table 10.1 control when the interior-point optimizer terminates.

Parameter name Purpose
MSK_DPAR_INTPNT_TOL_PFEAS Controls primal feasibility.
MSK_DPAR_INTPNT_TOL_DFEAS Controls dual feasibility.
MSK_DPAR_INTPNT_TOL_REL_GAP Controls relative gap.
MSK_DPAR_INTPNT_TOL_INFEAS Controls when the problem is declared primal or dual infeasible.
MSK_DPAR_INTPNT_TOL_MU_RED Controls when the complementarity is reduced enough.
Table 10.1: Parameters employed in termination criterion.

10.2.3. The simplex based optimizer

An alternative to the interior-point optimizer is the simplex optimizer. The simplex optimizer employs a different approach than the interior-point optimizer when solving a problem. Contrary to the interior-point optimizer the simplex optimizer can exploit a guess for the optimal solution to reduce solution time. Depending on the problem it may be faster or slower to exploit a guess for the optimal solution. See Section 10.2.4 for a discussion.

MOSEK provides both a primal and a dual variant of the simplex optimizer — we will return to this later.

10.2.3.1. Simplex termination criterion

The simplex optimizer terminates when it finds an optimal basic solution or an infeasibility certificate. A basic solution is optimal when it is primal and dual feasible, see (9.1.1) and (9.1.2) for a definition of the primal and dual problem. Due the fact that to computations are performed in finite precision MOSEK allows violation of primal and dual feasibility within certain tolerances. The user can control the allowed primal and dual infeasibility with the parameters MSK_DPAR_BASIS_TOL_X and MSK_DPAR_BASIS_TOL_S.

10.2.3.2. Starting from an existing solution

When using the simplex optimizer it may be possible to reuse an existing solution and thereby reduce the solution time significantly. When a simplex optimizer starts from an existing solution it is said to perform a “hot-start”. If the user is solving a sequence of optimization problems by solving the problem, making modifications, and solving again, MOSEK will hot-start automatically.

Setting the parameter MSK_IPAR_OPTIMIZER to MSK_OPTIMIZER_FREE_SIMPLEX instructs MOSEK to select automatically between the primal and the dual simplex optimizers. Hence, MOSEK tries to choose the best optimizer given the problem and the available solution.

By default MOSEK uses presolve when performing a hot-start. If the optimizer only needs very few iterations to find the optimal solution it may be better to turn off the presolve.

10.2.3.3. Numerical difficulties in the simplex optimizers

MOSEK is designed to minimize numerical difficulties, however, in rate cases the optimizer may have a hard time solving a problem. MOSEK counts a numerical unexpected behavior inside the optimizer as a “set-back”. The user can define how many set-backs the optimizer accepts, and if that number is exceeded, the optimization will be aborted. Set-Backs are implemented to avoid long sequences where the optimizer tries to recover from an unstable situation.

What counts as a set-back? It is hard to say without getting very technical but obvious cases are repeated singularities when factorizing the basis matrix, repeated loss of feasibility, degeneracy problems (no progress in objective) or other events indicating numerical difficulties. If the simplex optimizer encounters a lot of set-backs the problem is usually badly scaled. In such a situation try to reformulate into a better scaled problem. If a lot of set-backs still occur, then trying one or more of the following suggestions may be worthwhile.

10.2.4. The interior-point or the simplex optimizer?

Given a linear optimization problem, which optimizer is the best: The primal simplex, the dual simplex or the interior-point optimizer?

It is impossible to provide a general answer to this question, however, the interior-point optimizer behaves more predictably — it tends to use between 20 and 100 iterations, almost independently of problem size — but cannot perform hot-start, while simplex can take advantage of an initial solution, but is less predictable for cold-start. The interior-point optimizer is used by default.

10.2.5. The primal or the dual simplex variant?

MOSEK provides both a primal and a dual simplex optimizer. Predicting which simplex optimizer is faster is simply impossible, however, in recent years the dual optimizer has experienced several algorithmic and computational improvements, which, in our experience, makes it faster on average than the primal simplex optimizer. Still, it depends much on the problem structure and size.

Setting the MSK_IPAR_OPTIMIZER parameter to MSK_OPTIMIZER_FREE_SIMPLEX instructs MOSEK to choose which simplex optimizer to use automatically.

To summarize, if you want to know which optimizer is faster for a given problem type, you should try all the optimizers.

10.3. Linear network optimization

10.3.1. Network flow problems

Linear optimization problems with the network flow structure specified in Section 9.2 can in most cases be solved significantly faster with a specialized version of the simplex method [9], rather than with the general solvers.

MOSEK includes a network simplex solver, which usually solves network problems 10 to 100 times faster than the standard simplex optimizers implemented by MOSEK.

To use the network simplex optimizer, do the following

MOSEK will automatically detect the network structure and apply the specialized simplex optimizer.

10.3.2. Embedded network problems

Often problems contains both large parts with network structure and some non-network constraints or variables — such problems are said to have embedded network structure. If the procedure described above is applied, MOSEK will try to exploit this structure to speed up the optimization.

This is done by heuristically detecting the largest network embedded in the problem, solving this using the network simplex optimizer, and using this solution to hot-start a normal simplex optimizer.

The MSK_IPAR_SIM_NETWORK_DETECT parameter defines how large a percentage of the problem should be a network before the specialized solver is applied. In general, it is recommended to use the network optimizer only on problems containing a substantial embedded network.

10.4. Conic optimization

10.4.1. The interior-point optimizer

For conic optimization problems only an interior-point type optimizer is available. The interior-point optimizer is an implementation of the so-called homogeneous and self-dual algorithm. For a detailed description of the algorithm, please see [19].

10.4.1.1. Interior-point termination criteria

The parameters controlling when the conic interior-point optimizer terminates are shown in Table 10.2.

Parameter name Purpose
MSK_DPAR_INTPNT_CO_TOL_PFEAS Controls primal feasibility
MSK_DPAR_INTPNT_CO_TOL_DFEAS Controls dual feasibility
MSK_DPAR_INTPNT_CO_TOL_REL_GAP Controls relative gap
MSK_DPAR_INTPNT_TOL_INFEAS Controls when the problem is declared infeasible
MSK_DPAR_INTPNT_CO_TOL_MU_RED Controls when the complementarity is reduced enough
Table 10.2: Parameters employed in termination criterion.

10.5. Nonlinear convex optimization

10.5.1. The interior-point optimizer

For quadratic, quadratically constrained, and general convex optimization problems only an interior-point type optimizer is available. The interior-point optimizer is an implementation of the homogeneous and self-dual algorithm. For a detailed description of the algorithm, please see [24, 23].

10.5.1.1. Interior-point termination criteria

The parameters controlling when the general convex interior-point optimizer terminates are shown in Table 10.3.

Parameter name Purpose
MSK_DPAR_INTPNT_NL_TOL_PFEAS Controls primal feasibility
MSK_DPAR_INTPNT_NL_TOL_DFEAS Controls dual feasibility
MSK_DPAR_INTPNT_NL_TOL_REL_GAP Controls relative gap
MSK_DPAR_INTPNT_TOL_INFEAS Controls when the problem is declared infeasible
MSK_DPAR_INTPNT_NL_TOL_MU_RED Controls when the complementarity is reduced enough
Table 10.3: Parameters employed in termination criteria.

10.6. Solving problems in parallel

If a computer has multiple CPUs, or has a CPU with multiple cores, it is possible for MOSEK to take advantage of this to speed up solution times.

10.6.1. Thread safety

The MOSEK API is thread-safe provided that a task is only modified or accessed from one thread at any given time — accessing two separate tasks from two separate threads at the same time is safe. Sharing an environment between threads is safe.

10.6.2. The parallelized interior-point optimizer

The interior-point optimizer is capable of using multiple CPUs or cores. This implies that whenever the MOSEK interior-point optimizer solves an optimization problem, it will try to divide the work so that each CPU gets a share of the work. The user decides how many CPUs MOSEK should exploit.

It is not always possible to divide the work equally, and often parts of the computations and the coordination of the work is processed sequentially, even if several CPUs are present. Therefore, the speed-up obtained when using multiple CPUs is highly problem dependent. However, as a rule of thumb, if the problem solves very quickly, i.e. in less than 60 seconds, it is not advantageous to use the parallel option.

The MSK_IPAR_INTPNT_NUM_THREADS parameter sets the number of threads (and therefore the number of CPUs) that the interior point optimizer will use.

10.6.3. The concurrent optimizer

An alternative to the parallel interior-point optimizer is the concurrent optimizer. The idea of the concurrent optimizer is to run multiple optimizers on the same problem concurrently, for instance, it allows you to apply the interior-point and the dual simplex optimizers to a linear optimization problem concurrently. The concurrent optimizer terminates when the first of the applied optimizers has terminated successfully, and it reports the solution of the fastest optimizer. In that way a new optimizer has been created which essentially performs as the fastest of the interior-point and the dual simplex optimizers.Hence, the concurrent optimizer is the best one to use if there are multiple optimizers available in MOSEK for the problem and you cannot say beforehand which one will be faster.

Note in particular that any solution present in the task will also be used for hot-starting the simplex algorithms. One possible scenario would therefore be running a hot-start dual simplex in parallel with interior point, taking advantage of both the stability of the interior-point method and the ability of the simplex method to use an initial solution.

By setting the

MSK_IPAR_OPTIMIZER

parameter to

MSK_OPTIMIZER_CONCURRENT

the concurrent optimizer chosen.

The number of optimizers used in parallel is determined by the

MSK_IPAR_CONCURRENT_NUM_OPTIMIZERS.

parameter. Moreover, the optimizers are selected according to a preassigned priority with optimizers having the highest priority being selected first. The default priority for each optimizer is shown in Table 10.6.3.

Table 10.4: Default priorities for optimizer selection in concurrent optimization.

For example, setting the MSK_IPAR_CONCURRENT_NUM_OPTIMIZERS parameter to 2 tells the concurrent optimizer to the apply the two optimizers with highest priorities: In the default case that means the interior-point optimizer and one of the simplex optimizers.

10.6.3.1. Concurrent optimization through the API

The following example shows how to call the concurrent optimizer through the API.

10.6.4. A more flexible concurrent optimizer

MOSEK also provides a more flexible method of concurrent optimization by using the function MSK_optimizeconcurrent. The main advantages of this function are that it allows the calling application to assign arbitrary values to the parameters of each tasks, and that call-back functions can be attached to each task. This may be useful in the following situation: Assume that you know the primal simplex optimizer to be the best optimizer for your problem, but that you do not know which of the available selection strategies (as defined by the MSK_IPAR_SIM_PRIMAL_SELECTION parameter) is the best. In this case you can solve the problem with the primal simplex optimizer using several different selection strategies concurrently.

An example demonstrating the usage of the MSK_optimizeconcurrent function is included below. The example solves a single problem using the interior-point and primal simplex optimizers in parallel.

Mon Sep 14 15:56:07 2009