YALMIP Example : Linear regression

Nonlinear operators

YALMIP supports modeling of nonlinear, often non-differentiable, operators that typically occur in convex programming. Nine simple operators are currently supported: min, max, abs, sqrt, norm, sumk, sumabsk, geomean and cpower, and users can easily add their own (see the end of this page). The operators can be used intuitively, and YALMIP will automatically try to find out if they are used in a way that enables a convex representation. Although this can simplify the modeling phase significantly in some cases, it is recommended not to use these operators unless you know how to model them by your self using epigraphs and composition rules of convex and concave functions, why and when it can be done etc. The text-book [S. Boyd and L. Vandenberghe] should be a suitable introduction for the beginner.

In addition to modeling convex and concave operators and perform automatic analysis and derivation of equivalent conic programs, YALMIP also uses the nonlinear operator framework for implementing logic expression involving or and and, and in the same vein but on a higher level, to handle piecewise functions in pwf.

The nonlinear operator framework was initially implemented for functions that can be modelled rigorously using conic constraints and additional variables. However, there are many functions that cannot be exactly modelled using conic constraints, such as exponential functions and logarithms, but are convex or concave, and of course can be analyzed in terms of convexity preserving operations. These function are supported in a framework called evaluation based nonlinear operators. The models using these general convex functions will be analysed for convexity, but the resulting model will be a problem that only can be solved using a general nonlinear solver, such as fmincon. See evaluation based nonlinear operators. Note that this extension still is experimental and not intended for large problems.

Convexity analysis in 10 lines

Without going into theoretical details, the convexity analysis is based on epi- and hypograph formulations, and composition rules. For the compound expression f = h(g(x)), it holds that (For simplicity, we write increasing, decreasing, convex and concave, but the correct notation would be nondecreasing, nonincreasing, convex or affine and concave or affine. This notation us used throughout this manual and inside YALMIP)

f is convex if h is convex and increasing and g is convex
f is convex if h is convex and decreasing and g is concave
f is concave if h is concave and increasing and g is concave
f is concave if h is concave and decreasing and g is convex

Based on this information, it is possible to recursively analyze convexity of a complex expression involving convex and concave functions. When solvesdp is called, YALMIP checks the convexity of objective function and constraints by using information about the properties of the operators. If YALMIP manage to prove convexity, graph formulations of the operators are automatically introduced. This means that the operator is replaced with a graph, i.e., a set of constraints.

epigraph: t represents convex function f(x) : replace with t≥f(x)
hypograph: t represents concave function f(x) : replace with t≤f(x)

Of course, in order for this to be useful, the epigraph representation has to be represented using standard constraints, such as conic constraints.

The operators

The operators defined in the current release are described in the table below. This information might be useful to understand how and when YALMIP can derive convexity.

Name	Convex/Concave	Monotinicity	Comments
abs	convex	none
min	concave	increasing
max	convex	increasing
norm	convex	none	All standard norms (1,2, inf and Frobenius) can be used (applicable on both vectors and matrices.)
sumk	convex	See comment	Defines the sum of the k largest elements of a vector, or the sum of the k largest eigenvalues of a symmetric matrix. Increasing for vector arguments, no monotinicity defined for eigenvalue operator.
sumabsk	convex	none	Defines the sum of the k largest absolute value elements of a vector, or the sum of the k largest absolute value eigenvalues of a symmetric matrix.
geomean	concave	See comment	For vector arguments, the operator is increasing. For symmetric matrix argument, the operator is defined as the geometric mean of the eigenvalues. No monotinicity defined for eigenvalue operator
cpower	See comment	See comment	Convexity-aware version of power. For negative powers, the operator is convex and decreasing. For positive powers less than one, the operator is concave and increasing. Positive powers larger than 1 gives a convex increasing operator.
sqrt	concave	increasing	Short for cpower(x,0.5)

Standard use

Consider once again the linear regression problem.

a = [1 2 3 4 5 6]';
t = (0:0.2:2*pi)';
x = [sin(t) sin(2*t) sin(3*t) sin(4*t) sin(5*t) sin(6*t)];
y = x*a+(-4+8*rand(length(x),1));
a_hat = sdpvar(6,1);
residuals = y-x*a_hat;

Using abs and max, we can easily solve the L₁ and the L_∞ problem (Note that the abs operator currently has performance issues and should be avoided for large arguments. Moreover, explicitly creating absolute values when minimizing the L_∞ error is unnecessarily complicated).

solvesdp([],sum(abs(residuals)));
a_L1 = double(a_hat)
solvesdp([],max(abs(residuals)));
a_Linf = double(a_hat)

YALMIP automatically concludes that the objective functions can be modeled using some additional linear inequalities, adds these, and solves the problems. We can simplify the code even more by using the norm operator (this is much faster for large-scale problems due to implementation issues in YALMIP). Here we also compute the least-squares solution (note that this norm will generate a second-order cone constraint).

solvesdp([],norm(residuals,1));
a_L1 = double(a_hat)
solvesdp([],norm(residuals,2));
a_L2 = double(a_hat)
solvesdp([],norm(residuals,inf));
a_Linf = double(a_hat)

The following piece of code shows how we easily can solve a regularized problem.

solvesdp([],1e-3*norm(a_hat,2)+norm(residuals,inf));
a_regLinf = double(a_hat)

The norm operator is used exactly as the built-in norm function in MATLAB, both for vectors and matrices. Hence it can be used also to minimize the largest singular value (2-norm in matrix case), or the Frobenious norm of a matrix.

The double command of-course applies also to the nonlinear operators (double(OPERATOR(X)) returns OPERATOR(double(X)).

double(1e-3*norm(a_hat,2)+norm(residuals,inf))
ans =
    3.1175

A construction useful for maximizing determinants of positive definite matrices is the function (det P)^1/m, for positive definite matrix P, where m is the dimension of P. This concave function, called geomean in YALMIP, is supported as an extended operator. Note that the positive semidefiniteness constraint on P is added automatically by YALMIP.

D = randn(5,5);
P = sdpvar(5,5);
solvesdp(set(P < D*D'),-geomean(P));

The command can be applied also on positive vectors, and will then model the geometric mean of the elements. We can use this to find the analytic center of a set of linear inequalities (note that this is absolutely not the recommended way to compute the analytic center.)

A = randn(15,2);
b = rand(15,1)*5;

x = sdpvar(2,1);
solvesdp([],-geomean(b-A*x)); % Maximize product of elements in b-Ax, s.t Ax < b

Rather advanced constructions are possible, and YALMIP will try derive an equivalent convex model.

sdpvar x y z
F = set(max(1,x)+max(y^2,z) < 3)+set(max(1,-min(x,y)) < 5)+set(norm([x;y],2) < z);
sol = solvesdp(F,max(x,z)-min(y,z)-z);

Polynomial and sigmonial expressions

By default, polynomial expressions (except quadratics) are not analyzed with respect to convexity and conversion to a conic model is not performed. Hence, if you add a constraint such as set(x^4 + y^8-x^0.5 < 10), YALMIP may complain about convexity, even though we can see that the expression is convex and can be represented using conic constraints. More importantly, YALMIP will not try to derive an equivalent conic model. However, by using the command cpower instead, rational powers can be used.

To illustrate this, first note the difference between a monomial generated using overloaded power and a variable generated cpower.

sdpvar x
x^4
Polynomial scalar (real, homogeneous, 1 variable)
cpower(x,4)
Linear scalar (real, derived, 1 variable)

The property derived indicates that YALMIP will try to replace the variable with its epigraph formulation when the problem is solved. Working with these convexity-aware monomials is no different than usual.

sdpvar x y
F = set(cpower(x,4) + cpower(y,4) < 10) + set(cpower(x,2/3) + cpower(y,2/3) > 1);
plot(F,[x y]);

Note that when you plot sets with constraints involving nonlinear operators and polynomials, it is recommended that you specify the variables of interest in the second argument (YALMIP may otherwise plot the set with respect to auxiliary variables introduced during the construction of the conic model.)

Do not use these operators unless you really need them. The conic representation of rational powers easily grow large.

Limitations

If the convexity propagation fails, an error will be issued (error code 14). Note that this does not imply that the model is nonconvex, but only means that the simple sufficient conditions used for checking convexity were violated. Failure is however typically an indication of a bad model, and most often due to an actual nonconvex part in the model. The problems above are all convex, but not this problem below, due to the way min enters in the constraint.

sdpvar x y z
F = set(max(1,x)+max(y^2,z) < 3)+set(max(1,min(x,y)) < 5)+set(norm([x;y],2) < z);
sol = solvesdp(F,max(x,z)-min(y,z)-z);
sol.info

ans =
 Convexity check failed (Expected convex function in constraint #2 at level 2)

In the same sense, this problem fails due to a nonconvex objective function.

sdpvar x y z
F = set(max(1,x)+max(y^2,z) < 3);
sol = solvesdp(F,-norm([x;y]));
sol.info

ans =
 Convexity check failed (Expected concave function in objective at level 1)

This following problem is however convex, but convexity propagation fails.

sdpvar x
sol = solvesdp([],norm(max([1 1-x 1+x])))
sol.info

ans =
 Convexity check failed (Monotonicity required at objective at level 1)

The described operators cannot be used in polynomial expressions in the current implementation. The following problem is trivially convex but fails.

sdpvar x y
sol = solvesdp([],norm([x;y])^2);
sol.info

ans =
 Convexity check failed (Operator in polynomial in objective)

Another limitation is that the operators not are allowed in cone and semidefinite constraints.

sdpvar x y
sol = solvesdp(set(cone(max(x,y,1),2)),x+y);
sol.info

ans =
 Convexity propagation failed (YALMIP)

In practice, these limitations should not pose a major problem. A better model is possible (and probably recommended) in most cases if these situations occur.

Mixed integer models

In some cases when the convexity analysis fails, it is possible to tell YALMIP to switch from a graph based approach to a mixed integer model based approach. In other words, if no graph model can be derived, YALMIP introduces integer variables to model the operators. This is currently implemented for min, max, abs and linear norm for real arguments. By default, this feature is not invoked, but can be activated by sdpsettings('allowmilp',1).

Consider the following simple example which fails due to the non-convex use of the convex abs operator

sdpvar x y
F = set(abs(abs(x+1)+3) > y)+set(0<x<3);
sol = solvesdp(F,-y);
sol.info
 Convexity check failed (Expected concave function in constraint #1 at level 1)

By turning on the mixed integer fall back model, a mixed integer LP is generated and the problem is easily solved.

sdpvar x y
F = set(abs(abs(x+1)+3) > y)+set(0<x<3);
sol = solvesdp(F,-y,sdpsettings('allowmilp',1));
double([x y])
ans =
    3.0000    7.0000

If you know that your model is non-convex and will require a mixed integer model, you can bypass the initial attempt to generate the graph model by using sdpsettings('allowmilp',2).

Evaluation based nonlinear operators

YALMIP now also supports experimental support for general convex/concave functions that cannot be modelled using conic representations. The main difference when working with these operators is that the problem always requires a general nonlinear solver to solved, such as fmincon. All convexity analysis is still performed tough.

sdpvar x
solvesdp(set(exp(2*x + 1) < 1),-x,sdpsettings('solver','fmincon'));
double(x)
ans =
   -0.5000

double(exp(2*x + 1))
ans =
     1

Note that this feature still is very limited and experimental. Too see how you can add our own function, see the example for scalar entropy.

As a word of caution, note that fmincon performs pretty bad on problems with functions that aren't defined everywhere, such as logarithms. Hence, solving problem involving these functions can easily lead to problems. It is highly recommended to at least provide a feasible solution, or even better, to use the inverse operator to formulate the problem. Consider the following trivial example of finding the analytic center of a unit cube centered at the point (3,3,3)

x = sdpvar(3,1);
p = [1-(x-3);(x-3)+1]

% Not recommended
solvesdp([],-sum(log(p)));

% Better
assign(x,[3.1;3.2;3.3]);
solvesdp([],-sum(log(p)),sdpsettings('usex0',1));

% Best (well, adding initials on x and t would be even better)
t = sdpvar(3,1);
solvesdp(exp(t) < p ,-sum(t));

Behind the scene

If you want to look at the model that YALMIP generates, you can use the two commands model and expandmodel. Please note that these expanded models never should be used manually. The commands described below should only be used for illustrating the process that goes on behind the scenes.

With the command model, the epi- or hypograph model of the variable is returned. As an example, to model the maximum of two scalars x and y, YALMIP generates two linear inequalities.

sdpvar x y
t = max([x y]);
F = model(t)
++++++++++++++++++++++++++++++++++++++++++++
|   ID|      Constraint|               Type|
++++++++++++++++++++++++++++++++++++++++++++
|   #1|   Numeric value|   Element-wise 1x2|
++++++++++++++++++++++++++++++++++++++++++++
sdisplay(sdpvar(F(1)))
ans =
   '-x+t'    '-y+t'

For more advanced models with recursively used nonlinear operators, the function model will not generate the complete model since it does not recursively expand the arguments. For this case, use the command expandmodel. This command takes two arguments, a set of constraints and an objective function. To expand an expression, just let the expression take the position as the objective function. Note that the command assumes that the expansion is performed in order to prove a convex function, hence if you expression is meant to be concave, you need to negate it. To illustrate this, let us expand the objective function in an extension of the geometric mean example above.

A = randn(15,2);
b = rand(15,1)*5;

x = sdpvar(2,1);
expandmodel([],-geomean([b-A*x;min(x)]))
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|    ID|      Constraint|                               Type|
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|    #1|   Numeric value|                   Element-wise 2x1|
|    #2|   Numeric value|   Second order cone constraint 3x1|
|    #3|   Numeric value|   Second order cone constraint 3x1|
|    #4|   Numeric value|   Second order cone constraint 3x1|
|    #5|   Numeric value|   Second order cone constraint 3x1|
|    #6|   Numeric value|   Second order cone constraint 3x1|
|    #7|   Numeric value|   Second order cone constraint 3x1|
|    #8|   Numeric value|   Second order cone constraint 3x1|
|    #9|   Numeric value|   Second order cone constraint 3x1|
|   #10|   Numeric value|   Second order cone constraint 3x1|
|   #11|   Numeric value|   Second order cone constraint 3x1|
|   #12|   Numeric value|   Second order cone constraint 3x1|
|   #13|   Numeric value|   Second order cone constraint 3x1|
|   #14|   Numeric value|   Second order cone constraint 3x1|
|   #15|   Numeric value|   Second order cone constraint 3x1|
|   #16|   Numeric value|   Second order cone constraint 3x1|
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

The result is two linear inequalities related to the min operator and 15 second order cone constraints used for the conic representation of the geometric mean.

Adding new operators

If you want to add your own operator, all you need to do is to create 1 file. This file should be able to return the numerical value of the operator for a numerical input, and return the epigraph (or hypograph) and a descriptive structure of the operator when the first input is 'graph'. As an example, the following file implements the nonlinear operator tracenorm. This convex operator returns sum(svd(X)) for matrices X. This value can also be described as the minimizing argument of the optimization problem min_t,A,B t subject to set([A X;X' B] > 0) + set(trace(A)+trace(B) < 2*t).

function varargout = tracenorm(varargin)

switch class(varargin{1})    

    case 'double' % What is the numerical value (needed for displays etc)
        varargout{1} = sum(svd(varargin{1}));

    case 'char'   % YALMIP send 'graph' when it wants the epigraph or hypograph
        switch varargin{1}
	 case 'graph'
            t = varargin{2}; % 2nd arg is the extended operator variable
            X = varargin{3}; % 3rd arg and above are args user used when defining t.
            A = sdpvar(size(X,1));
            B = sdpvar(size(X,2));
            F = set([A X;X' B] > 0) + set(trace(A)+trace(B) < 2*t);

            % Return epigraph model
            varargout{1} = F;
            % a description 
            properties.convexity    = 'convex';   % convex | none | concave
            properties.monotonicity = 'none';     % increasing | none | decreasing
            properties.definiteness = 'positive'; % negative | none | positive  
            varargout{2} = properties;
            % and the argument
            varargout{3} = X;

         case 'milp'
	    varargout{1} = [];
	    varargout{2} = [];
	    varargout{3} = [];

         otherwise
            error('Something is very wrong now...')
        end    

    case 'sdpvar' % Always the same. 
        varargout{1} = yalmip('addextendedvariable',mfilename,varargin{:});    

    otherwise
end

The function sumk.m in YALMIP is implemented using this framework and might serve as an additional fairly simple example. The overloaded operator norm.m is also defined using this method, but is a bit more involved, since it supports different norms. Note that we with a slight abuse of notation use the terms increasing and decreasing instead of nondecreasing and nonincreasing.

Adding new operators with mixed integer models

If the convexity analysis fails, it is possible to have fall back alternative models based on integer variables. If the operator is called with the flag milp, a mixed integer exact model can be returned. As an illustration, here is a stripped down version of the epigraph and MILP model of the absolute value of a real scalar.

function varargout = scalarrealabs(varargin)

switch class(varargin{1})    

    case 'double' 
        varargout{1} = abs(varargin{1});

    case 'char'   % YALMIP send 'graph' when it wants the epigraph or hypograph
        switch varargin{1}
	 case 'graph'
            t = varargin{2}; 
            X = varargin{3}; 
            varargout{1} = set(-t <= X <= t);
            properties.convexity    = 'convex';   
            properties.monotonicity = 'none';     
            properties.definiteness = 'positive'; 
            varargout{2} = properties;
            varargout{3} = X;

         case 'milp'
            t = varargin{2}; 
            X = varargin{3}; 
	    d = binvar(1,1); % d=1 means x>0, d=0 means x<0
	    F = set([]);
	    M = 1e4; % Big-M constant
	    F = F + set(x >= -M*(1-d))                     % d=1 means x >= 0
	    F = F + set(x <= M*d)                          % d=0 means x <= 0
	    F = F + set(-M*(1-d) <= t-x <= M*(1-d); % d=1 means t = X
	    F = F + set(-M*d     <= t+x <= M*d;     % d=0 means t = -X

	    varargout{1} = F;
	    properties.convexity    = 'milp';   
            properties.monotonicity = 'milp';     
            properties.definiteness = 'milp'; 
	    varargout{2} = properties;
	    varargout{3} = X;

         otherwise
            error('Something is very wrong now...')
        end    

    case 'sdpvar' % Always the same. 
        varargout{1} = yalmip('addextendedvariable',mfilename,varargin{:});    

    otherwise
end

MILP models are most often based on so called big-M models. For these methods to work well, it is important to have as small constants M as possible, but in the code above, we just picked a number. For the MILP models defined by default in YALMIP (min, max, abs and linear norms), more effort is spent on choosing the constants. To learn more about how you can do this for your model, please check out the code for these models.

Adding evaluation based nonlinear operators

General convex and concave functions are support in YALMIP by the evaluation based nonlinear operator framework. The definition of these operators are almost identical to the definition of standard nonlinear operators. The following code implements a (simplified version) of a scalar entropy measure -xlog(x).

function varargout = entropy(varargin)

switch class(varargin{1})

    case 'double' % What is the numerical value of this argument (needed for displays etc)        
	varargout{1} = -varargin{1}*log(varargin{1});

    case 'sdpvar' % Overloaded operator for SDPVAR objects.
         varargout{1} = yalmip('addEvalVariable',mfilename,varargin{1});

    case 'char' % YALMIP sends 'graph' when it wants the epigraph, hypograph or domain definition
        switch varargin{1}
            case 'graph'
                t = varargin{2};
                X = varargin{3};

                % This is different from standard extended operators.
                % Just do it!
		F = SetupEvaluationVariable(varargin{:});
                                
                % Now add your own code, typically domain constraints
                F = F + set(X > 0);
                
                % Let YALMIP know about convexity etc                
                varargout{1} = F;
                varargout{2} = struct('convexity','concave','monotonicity','none','definiteness','none');
                varargout{3} = X;

            case 'milp' % No MILP model available for entropy
                    varargout{1} = [];
                    varargout{2} = [];
                    varargout{3} = [];                
            otherwise
                error('ENTROPY called with CHAR argument?');
        end
    otherwise
        error('ENTROPY called with invalid argument.');
end

The evaluation based framework is primarily intended for scalar functions, but can be extended to support element-wise vector functions. See the implementation of the overloaded log operator for details.