CPSC 532S: Modern Statistical Learning Theory

Canvas and Piazza (easiest registration if you follow the link from Canvas, but you can sign up directly here)

Schedule

SSBD below refers to the book of Shalev-Shwartz and Ben-David; MRT to that of Mohri, Rostamizadeh, and Talwakar. Italicized entries are tentative.

1	Mon	Jan 10	Intro / overview	SSBD 1-2; MRT 2
	Mon	Jan 10	Assignment 1 posted (and `.tex`)
2	Wed	Jan 12	PAC	SSBD 2-3; MRT 2
3	Mon	Jan 17	Probability / uniform convergence	Measure Theory Tutorial SSBD 4; MRT 2
4	Wed	Jan 19	Finish uniform convergence + No free lunch + start of VC	SSBD 5-6
	Thu	Jan 20	Assignment 1 due, 11:59pm
	Fri	Jan 21	Drop deadline
5	Mon	Jan 24	More on VC dimension	SSBD 6; MRT 3
6	Wed	Jan 26	More VC + Rademacher	SSBD 9.1; MRT 3
7	Mon	Jan 31	More Rademacher Some fiddly issues with abs value; update on Wednesday	MRT 3; SSBD 26
8	Wed	Feb 2	Even more Rademacher	MRT 3, 11; SSBD 26
	Fri	Feb 4	Assignment 2 posted (and `.tex`)
9	Mon	Feb 7	Structural Risk Minimization	SSBD 7; MRT 4
10	Wed	Feb 9	Modes of learnability + Model selection plus the long-awaited proof of Massart's lemma	SSBD 7, 11; MRT 4
	Mon	Feb 14	Shift to hybrid mode (delayed by being sick)
11	Mon	Feb 14	Convex learning problems + Gradient descent	SSBD 12, 14 Bubeck, Boyd/Vandenberghe
12	Wed	Feb 16	SGD	SSBD 14
	Fri	Feb 18	Assignment 2 due, 11:59pm
	Mon	Feb 21	Midterm break
	Wed	Feb 23	Midterm break
13	Mon	Feb 28	Regularization + Stability	SSBD 13; MRT 14
14	Wed	Mar 2	SVMs + Margin bounds	SSBD 15, 26.3; MRT 5
15	Mon	Mar 7	SVM duality, kernel definitions	SSBD 15/16; MRT 5/6; more kernel stuff linked in slides
16	Wed	Mar 9	More kernels: representer theorem, kernel ridge	SSBD 15/16; MRT 5/6; more kernel stuff linked in slides
	Mon	Mar 14	Assignment 3 posted (and `.tex`)
17	Mon	Mar 14	Some more kernels (universality, Gaussian processes) + Deep learning (approximation, generalization)	Telgarsky section 2
18	Wed	Mar 16	More deep learning approximation + generalization	Telgarsky section 14
	Wed	Mar 16	Project proposals due
	Mon	Mar 21	Class canceled
19	Wed	Mar 23	Neural tangent kernels	Telgarsky sections 4, 8
20	Mon	Mar 28	“Does any of this stuff work at all?” Limits of NTK + Interpolation and the limits of uniform convergence
	Mon	Mar 28	Assignment 3 due (extended), 11:59pm
	Tue	Mar 29	Assignment 4 posted (and `.tex`)
21	Wed	Mar 30	Double descent and implicit regularization + PAC-Bayes	BHMM / NKBYBS / Telgarsky 10 SSBD 31 / Guedj
22	Mon	Apr 4	Online learning
	Wed	Apr 6	Project presentations
	Fri	Apr 8	Project writeups due
	Fri	Apr 8	Assignment 4 due, 11:59pm
	TBD		Take-home final (will have a significant window to do it during finals period)

Logistics

The course meets in person in DMP 101 (since Feb 14th) and is also available on Zoom: the meeting link and recordings are available on Canvas and Piazza.

Grading scheme: 70% assignments (including a small project), 30% final.

The lowest assignment grade (not including the project) will be dropped. The project counts as one assignment. Assignments should be done in LaTeX – not handwritten or in a word processor. Hand-in on Gradescope, as described on Piazza.

There will be one “big assignment” which serves as a (small) project: something on the scale of doing some experiments to explore a paper, doing a lit review in a particular area, extending / unifying a few papers, etc. A proposal will be due beforehand; details to come.

The final exam may be take-home, synchronous online, or in-person; TBD.

There may also be some paper presentations later in the course, in which case the paper presenters will be able to use that to replace part of an assignment grade. This is dependent on the COVID situation and other factors; TBD.

Overview

The brief idea of the course: when should we expect machine learning algorithms to work? What kinds of assumptions do we need to be able to be able to rigorously prove that they will work?

Definitely covered: PAC learning, VC dimension, Rademacher complexity, concentration inequalities. Probably: PAC-Bayes, analysis of kernel methods, margin bounds, stability. Maybe: limitations of uniform convergence, analyzing deep nets via neural tangent kernels, provable gaps between kernel methods and deep learning, online learning, feasibility of private learning, compression-based bounds.

There will be some overlap with CPSC 531H: Machine Learning Theory (Nick Harvey's course, last taught in 2018), but if you've taken that course, you'll still get something out of this one. We'll cover less on optimization / online learning / bandits than that course did, and try to cover some more recent ideas used in contemporary deep learning theory.

(This course is unrelated to CPSC 532S: Multimodal Learning with Vision, Language, and Sound, from Leon Sigal.)

Prerequisites

There are no formal prerequisites. I will roughly assume:

Basic "mathematical maturity": familiarity with reading and writing proofs, recognizing a valid proof, etc. If you've taken a third-year math course or similar, you should be fine.
Comfort with linear algebra, multivariable calculus, basic probability theory, and basic analysis of algorithms.
Ideally, a basic understanding of machine learning, as in CPSC 340. If you don't have this, you should still be able to get by, but might have to do a little more reading on your own; I'll provide some resources.
Ideally, familiarity with programming in a machine learning / statistical context, e.g. being comfortable with numpy and PyTorch/TensorFlow/etc. This course will not require programming, but there will be some assignment and project options that may be easier / more fruitful / more fun if you're comfortable with it.

If you have any specific questions about your background, feel free to ask.

Resources

Learning theory textbooks and surveys:

Understanding Machine Learning: From Theory to Algorithms (Shai Shalev-Shwartz, Shai Ben-David; 2014) – a very readable (free) book that covers a significant portion of our material
Foundations of Machine Learning (Mehryar Mohri, Afshin Rostamizadeh, Ameet Talwakar; second edition 2018) – free textbook, also quite good
Introduction to Statistical Learning Theory (Olivier Bousquet, Stéphane Boucheron, Gábor Lugosi; 2003) – 40-page survey of classics
On the Mathematical Foundations of Learning (Felipe Cucker, Steve Smale; 2001) – 50-page survey of classics
Deep learning theory lecture notes (Matus Telgarsky; ongoing updates) – overview including some quite modern stuff; we'll pull from here especially later in the course
An Introduction to Computational Learning Theory (Michael Kearns, Umesh Vazirani; 1994; that link should get you a copy with a UBC login) – a book from a much more CS theory point of view. A nice complement to what we'll cover in this course.

If you need to refresh your linear algebra or other areas of math:

Mathematics for Machine Learning (Marc Deisenroth, Aldo Faisal, Cheng Soon Ong; 2020) – a nice overview. Unfortunately their probability chapter doesn't use a measure theoretic point of view (see below), but if you don't know the material in it, you probably should.

Resources on learning measure-theoretic probability (not required to know this stuff in detail, but you might find it helpful):

A Measure Theory Tutorial (Measure Theory for Dummies) (Maya Gupta) – 5 pages, just the basics
Measure Theory, 2010 (Greg Hjorth) – 110 pages but comes recommended as both thorough and readable
A Probability Path (Sidney Resnick) – frequently recommended textbook aimed at non-mathematicians to learn it in detail, but it's a full-semester textbook scale of detail; available if you log in via UBC
There are also lots of other places, of course; e.g. the probability textbooks by Billingsley, Klenke, and Williams are (I think) classics.

Similar courses:

UBC CPSC 531H: Machine Learning Theory (Nick Harvey)
MIT 9.520: Statistical Learning Theory and Applications (Tomaso Poggio, Loreno Rosasco, Alexander Rakhlin, Ardrzej Banburski)
TTIC 31120: Computational and Statistical Learning Theory (Nati Srebro)
CMU 10-806: Foundations of Machine Learning and Data Science (Nina Balcan, Avrim Blum)
UIUC ECE 598MR: Statistical Learning Theory (Maxim Raginsky)

CPSC 532S: Modern Statistical Learning Theory – 2021-22 W2

Schedule

Logistics

Overview

Prerequisites

Resources