This is a set of Matlab routines I wrote for the course CS542B: Non-linear Optimization
by M. Friedlander. It implements a
variety of ways to solve 'LASSO' problems (Least Squares with a penalty on the
L1-norm of the parameters). That is, problems of the form:
min(w): ||Xw - y||^2 + v|w|
(the 'scaled norm' variant)
or:
min(w): ||Xw - y||^2, subject to |w| <= t
(the 'constrained norm' variant)
In the above, X is the 'n by p' design matrix, containing the p features for
each of the n instances. y is the 'n by 1' regression target, and
w is the 'p by 1' parameter vector that linearly transforms the rows of
X to match the targets y. v and t are parameters
that control the strength of the regularization. The goal is to find the
w that minimizes the squared error between Xw and y,
subject to the regularization.
The solutions to these types of problems are the subject of intense current research in the signal processing community (among others, such as statistics and machine learning), since penalizing the L1 norm leads to sparsity in terms of w (that is, only a subset of w is non-zero, so irrelevant features should get ignored). However, the non-differentiability of the objective has led to the proposal of a large (and confusing) variety of approaches to solve these types of problems. This course project surveyed these different methods, and "Matlab-implemented" a variety of the methods.
ActiveSet (constrained): An Active Set Quadratic Program solver designed for this problem. BlockCoordinate (scaled): A method that optimizes parameters blocks (uses another scaled solver as a sub-routine). Constrained (constrained or scaled, depending on mode): Formulates the problem as a constrained optimization problem (4 different ways), and uses a generic constrained solver (3 solvers available). GaussSeidel (scaled): A method that optimizes individual parameters, choosing the parameter that is furthest from the first-order optimality condition (bottom-up and top-down strategy implemented). Grafting (scaled): A method that optimizes a set of working parameters with standard unconstrained optimization using sub-gradients, and introduces parameters incrementally (ie. bottom-up). IteratedRidge (scaled): An EM-like algorithm that solves a sequence of ridge-regression problems (4 strategies to deal with instability and 3 strategies to solve ridge problem available). IterativeSoftThresholding (scaled, where norm(X) is less than or equal to 1): A fixed-point algorithm that repeatedly applies the soft-threshold operator to the current weight vector moved in the direction of steepest descent. LARS (constrained)*: Uses a version of the LARS algorithm where we stop at the desired value of lambda. NonNegativeSquared (scaled): Uses a reformulation in terms of non-negative-squared variables, that can be solved with an unconstrained optimizer. PrimalDualLogBarrier (scaled): An Interior Point Quadratic Program solver designed specifically for this problem. Projection (scaled): Uses a bound-constrained formulation, solved with a second-order gradient-projection strategy. Shooting (scaled): A method that optimizes individual parameters, cycling through them in order. SignConstraints (constrained): The algorithm of Tibshirani, where a sequence of Quadratic Programs are solved, each having 1 additional constraint. SubGradient (scaled): A sub-gradient strategy based on the first-order optimality conditions. UnconstrainedApx (scaled): Uses a smooth approximation to |w|_1, and solves with an unconstrained optimizer (3 approximations available). Also implements a strategy where the approximation is deformed into |w|_1 to speed convergence.*This method is not available in the on-line package. Contact me if you want this additional method, or use the nice implementation of LARS by Karl Sjöstrand available here (which can be used to compute the entire regularization path).
Many of the methods have additional parameters that can be passed as additional
arguments using {field,value} pairs. For example, to turn off verbosity and
use a different strategy for solving for the Newton direction in the
Primal-Dual method, use:
w = LassoPrimalDualLogBarrier(X,y,lambda,'verbose',0,'mode',1);
The complete set of .m files are available here. The report for the class project is available here.
Running with cold start iter n(w) n(step) f(w) opt(wi) free 1 0.00e+000 0.00e+000 2.65e+004 0 0 2 3.08e+000 3.08e+000 2.65e+004 1 1 3 5.37e+000 2.32e+000 2.23e+004 2 2 4 7.58e+000 2.22e+000 1.94e+004 3 3 5 9.66e+000 2.16e+000 1.76e+004 4 4 6 1.10e+001 2.15e+000 1.57e+004 18 5 7 1.10e+001 2.54e+000 1.41e+004 56 6 8 1.10e+001 2.90e+000 1.31e+004 77 7 9 1.10e+001 1.99e+000 1.20e+004 85 8 10 1.10e+001 1.82e+000 1.15e+004 87 9 11 1.10e+001 1.37e+000 1.11e+004 88 10 12 1.10e+001 8.46e-001 1.09e+004 90 11 13 1.10e+001 8.94e-001 1.08e+004 93 12 14 1.10e+001 4.25e-001 1.07e+004 94 13 15 1.10e+001 2.99e-001 1.07e+004 95 14 16 1.10e+001 2.24e-001 1.06e+004 96 15 17 1.10e+001 2.02e-001 1.06e+004 98 16 18 1.10e+001 1.30e-001 1.06e+004 99 17 19 1.10e+001 4.25e-002 1.06e+004 100 18 All Components satisfy condition Number of iterations: 19 Running with warm start iter n(w) n(step) f(w) opt(wi) free 1 1.10e+001 1.00e+000 1.16e+004 99 17 2 1.10e+001 4.25e-002 1.06e+004 100 18 All Components satisfy condition Number of iterations: 2 max_difference_between_cold_and_warm_start_solutions = 1.5543e-015In this case, the good initialization of the active set method reduces the number of iterations from 19 down to 2.