There's a more recent version of this course.
Instructor: Danica Sutherland (she): dsuth@cs.ubc.ca, ICICS X563.Canvas and Piazza (easiest registration if you follow the link from Canvas, but you can sign up directly here)
SSBD below refers to the book of Shalev-Shwartz and Ben-David; MRT to that of Mohri, Rostamizadeh, and Talwakar. Italicized entries are tentative.
1 | Mon | Jan 10 | Intro / overview | SSBD 1-2; MRT 2 |
---|---|---|---|---|
Mon | Jan 10 | Assignment 1 posted (and .tex) | ||
2 | Wed | Jan 12 | PAC | SSBD 2-3; MRT 2 |
3 | Mon | Jan 17 | Probability / uniform convergence | Measure Theory Tutorial SSBD 4; MRT 2 |
4 | Wed | Jan 19 | Finish uniform convergence + No free lunch + start of VC | SSBD 5-6 |
Thu | Jan 20 | Assignment 1 due, 11:59pm | ||
Fri | Jan 21 | Drop deadline | ||
5 | Mon | Jan 24 | More on VC dimension | SSBD 6; MRT 3 |
6 | Wed | Jan 26 | More VC + Rademacher | SSBD 9.1; MRT 3 |
7 | Mon | Jan 31 | More Rademacher Some fiddly issues with abs value; update on Wednesday | MRT 3; SSBD 26 |
8 | Wed | Feb 2 | Even more Rademacher | MRT 3, 11; SSBD 26 |
Fri | Feb 4 | Assignment 2 posted (and .tex) | ||
9 | Mon | Feb 7 | Structural Risk Minimization | SSBD 7; MRT 4 |
10 | Wed | Feb 9 | Modes of learnability + Model selection plus the long-awaited proof of Massart's lemma | SSBD 7, 11; MRT 4 |
Mon | Feb 14 | Shift to hybrid mode (delayed by being sick) | ||
11 | Mon | Feb 14 | Convex learning problems + Gradient descent | SSBD 12, 14 Bubeck, Boyd/Vandenberghe |
12 | Wed | Feb 16 | SGD | SSBD 14 |
Fri | Feb 18 | Assignment 2 due, 11:59pm | ||
Mon | Feb 21 | Midterm break | ||
Wed | Feb 23 | Midterm break | ||
13 | Mon | Feb 28 | Regularization + Stability | SSBD 13; MRT 14 |
14 | Wed | Mar 2 | SVMs + Margin bounds | SSBD 15, 26.3; MRT 5 |
15 | Mon | Mar 7 | SVM duality, kernel definitions | SSBD 15/16; MRT 5/6; more kernel stuff linked in slides |
16 | Wed | Mar 9 | More kernels: representer theorem, kernel ridge | |
Mon | Mar 14 | Assignment 3 posted (and .tex) | ||
17 | Mon | Mar 14 | Some more kernels (universality, Gaussian processes) + Deep learning (approximation, generalization) | Telgarsky section 2 |
18 | Wed | Mar 16 | More deep learning approximation + generalization | Telgarsky section 14 |
Wed | Mar 16 | Project proposals due | ||
Mon | Mar 21 | Class canceled | ||
19 | Wed | Mar 23 | Neural tangent kernels | Telgarsky sections 4, 8 |
20 | Mon | Mar 28 | “Does any of this stuff work at all?” Limits of NTK + Interpolation and the limits of uniform convergence | |
Mon | Mar 28 | Assignment 3 due (extended), 11:59pm | ||
Tue | Mar 29 | Assignment 4 posted (and .tex) | ||
21 | Wed | Mar 30 | Double descent and implicit regularization + PAC-Bayes | BHMM / NKBYBS / Telgarsky 10 SSBD 31 / Guedj |
22 | Mon | Apr 4 | Online learning | |
Wed | Apr 6 | Project presentations | ||
Fri | Apr 8 | Project writeups due | ||
Fri | Apr 8 | Assignment 4 due, 11:59pm | ||
TBD | Take-home final (will have a significant window to do it during finals period) |
The course meets in person in DMP 101 (since Feb 14th) and is also available on Zoom: the meeting link and recordings are available on Canvas and Piazza.
Grading scheme: 70% assignments (including a small project), 30% final.
The lowest assignment grade (not including the project) will be dropped. The project counts as one assignment. Assignments should be done in LaTeX – not handwritten or in a word processor. Hand-in on Gradescope, as described on Piazza.
There will be one “big assignment” which serves as a (small) project: something on the scale of doing some experiments to explore a paper, doing a lit review in a particular area, extending / unifying a few papers, etc. A proposal will be due beforehand; details to come.
The final exam may be take-home, synchronous online, or in-person; TBD.
There may also be some paper presentations later in the course, in which case the paper presenters will be able to use that to replace part of an assignment grade. This is dependent on the COVID situation and other factors; TBD.
The brief idea of the course: when should we expect machine learning algorithms to work? What kinds of assumptions do we need to be able to be able to rigorously prove that they will work?
Definitely covered: PAC learning, VC dimension, Rademacher complexity, concentration inequalities. Probably: PAC-Bayes, analysis of kernel methods, margin bounds, stability. Maybe: limitations of uniform convergence, analyzing deep nets via neural tangent kernels, provable gaps between kernel methods and deep learning, online learning, feasibility of private learning, compression-based bounds.
There will be some overlap with CPSC 531H: Machine Learning Theory (Nick Harvey's course, last taught in 2018), but if you've taken that course, you'll still get something out of this one. We'll cover less on optimization / online learning / bandits than that course did, and try to cover some more recent ideas used in contemporary deep learning theory.
(This course is unrelated to CPSC 532S: Multimodal Learning with Vision, Language, and Sound, from Leon Sigal.)
There are no formal prerequisites. I will roughly assume:
Learning theory textbooks and surveys:
If you need to refresh your linear algebra or other areas of math:
Resources on learning measure-theoretic probability (not required to know this stuff in detail, but you might find it helpful):
Similar courses: