Date
| Slides
| Related Links
|
July 7
| How many iterations of gradient descent do we need?
| Cauchy's 1847 paper,
Lipschitz relationships,
practical line-searches,
PL condition.
|
July 14
| Momentum, acceleration, and second-order methods
| heavy-ball,
CG,
SSO,
accelerated gradient,
restarting,
quadratic convergence
damped Newton (Section 9.5),
cubic regularization.
|
July 21
| Coordinate optimization and stochastic gradient descent
| random coordinate descent,
shuffle coordinate descent,
Gauss-Southwell,
block coordinate descent,
accelerated coordinate descent.
|
July 28
| SGD with Constant Step Sizes, Growing Batches, and Over-Parameterization
| non-convex SGD,
decreasing step SGD,
constant step SGD,
shuffle SGD,
growing batch size,
SGC,
accelerated SGD,
non-uniform SGD,
SGD + Armijo.
|
August 4
| No lecture
|
|
August 11
| Variance reduction and 1.5-Order Methods
| SAG,
SVRG,
non-uniform sampling,
acceleration
loopless SVRG,
SGD*,
SVRG for deep learning,
diagonal approximation,
Hessian-free Newton
mini-batch Hessian,
Newton sketch,
2.5-order,
Barzilai-Borwein,
quasi-Newton (superlinear),
L-BFGS,
initialization,
L-BFGS preconditioning,
explicit superlinear
)
|
August 18+
| Baby break
|
|
January 27
| Projected Gradient, Projected Newton, and Frank-Wolfe
| Translation of original PG and PN paper,
projection onto simple sets (Section 8.1),
Dykstra's algorithm,
active set identification and PG backtracking,
spectral projected gradient,
two-metric projection,
projected quasi-Newton,
projected coordinate descent,
Frank-Wolfe
|
February 17
| Global Optimization, Subgradients, and Cutting Planes
|
Random search
Bayesian optimization,
harmless global optimization,
BO rate,
subgradients,
subgradient method,
stochastic subgradient,
suffix averaging,
(k+1) averaging,
weakly-convex rate,
tame function convergence,
smoothing,
adaptive smoothing,
cutting planes,
randomized center of gravity,
ignoring non-smoothness,
bundle methods,
orthant-projected min-norm subgradient (Chapter 2)
|
April 21
| Proximal-Gradient and Fenchel Duality
|
Proximal-gradient (and acceleration),
active set complexity,
proximal PL,
group L1-regularization,
structured sparsity,
inexact proximal-gradient,
proximal average,
ADMM,
coordinate-wise proximal-gradient,
stochastic proximal-gradient,
regularized dual averaging,
proximal SVRG,
proximal Newton,
proximal point,
convex conjugate and duality (Section 3.3 and Chapter 5)
kernel methods,
Lipschitz-smoothness and strong-convexity duality,
Fenchel duality,
SDCA,
dual-free SDCA
gap safe screening,
SVM safe screening
|