There's a more recent version of this course.
Instructor: Danica Sutherland (she): dsuth@cs.ubc.ca, ICICS X563. TA: Milad Jalali.Previously offered in 2021W2 (under the name CPSC 532S); this instance will be broadly similar.
Italicized entries are tentative. The books SSBD, MRT, and Tel are described further here.
Lecture notes are here and are irregularly updated as we go. N2 below refers to section 2 of the notes. (I might split the files later, but for now it's all one pdf.)
Date | Topic | Material | Supplements | |
---|---|---|---|---|
W | Sep 7 | Course intro; start finite hypothesis classes | intro; N1-2 | SSBD 1-2; MRT 2 |
Th | Sep 8 | Assignment 1 posted: pdf, tex | ||
M | Sep 12 | Class canceled (sick) | ||
W | Sep 14 | PAC learning: definitions, finite hypothesis classes | slides; N2 | SSBD 2-4; MRT 2 |
M | Sep 19 | No class: National day of mourning | ||
T | Sep 20 | Assignment 1 due at noon | ||
T | Sep 20 | Drop deadline | ||
W | Sep 21 | Uniform convergence, concentration inequalities | N2-3 | SSBD 4, B; MRT 2, D Tel 12; Wainwright 2 |
Sa | Sep 24 | Assignment 2 posted: pdf, tex | ||
M | Sep 26 | More uniform convergence: Rademacher complexity | N4 | MRT 3; SSBD 26; Tel 13 |
W | Sep 28 | More Rademacher | N4 | MRT 3; SSBD 26; Tel 13 |
M | Oct 3 | No Free Lunch Theorem; very start of VC dimension [guest lecturer: Nick Harvey] | N5 | SSBD 5; MRT 3 |
W | Oct 5 | VC dimension [guest lecturer: Nick Harvey] | N6.1-6.3 | SSBD 6; MRT 3 |
M | Oct 10 | No class: Thanksgiving | ||
W | Oct 12 | Assignment 2 due at noon | ||
W | Oct 12 | The fundamental theorem of statistical learning | N6.4-6.5 | SSBD 6, 28; MRT 3 |
M | Oct 17 | Structural risk minimization | N7 | SSBD 7; MRT 4 |
W | Oct 19 | MDL, consistency, start of margins | N7-8 | SSBD 7 |
M | Oct 24 | Margins and SVMs | N9 | MRT 5; SSBD 15, 26 |
W | Oct 26 | More SVMs | N9 | MRT 5; SSBD 15, 26 |
F | Oct 28 | Assignment 3 posted: pdf, tex | ||
F | Oct 28 | Withdrawal deadline | ||
M | Oct 31 | Kernels I: setup | MRT 6; SSBD 16 | |
W | Nov 2 | Kernels II: Moore-Aronsazjn, representer theorems | MRT 6; SSBD 16 | |
M | Nov 7 | Kernels III: Examples, algorithms | MRT 6; SSBD 16 | |
T | Nov 8 | Assignment 3 due at 11:59pm | ||
W | Nov 9 | No class: midterm break | ||
M | Nov 14 | Kernels IV: regularization, operators | ||
W | Nov 16 | Stability + convex learning problems | SSBD 12, 13 | |
Su | Nov 20 | Assignment 4 posted: pdf, tex | ||
Su | Nov 20 | Assignment 5 posted: pdf, tex | ||
Su | Nov 20 | Paper Reading Assignment instructions posted | ||
M | Nov 21 | (Stochastic) gradient descent analysis | SSBD 14 | |
W | Nov 23 | Finish SGD analysis; start implicit regularization | SSBD 14 | |
M | Nov 28 | More implicit regularization / double descent; start NTK [Online: at NeurIPS] | slides | BHMM / NKBYBS / Tel 10 |
W | Nov 30 | NTK [Online: at NeurIPS] | slides | Tel 4, 8 |
M | Dec 5 | Universality and generalization in deep learning [Online: sick] | slides | Tel 2, 14 |
W | Dec 7 | Grab bag: failures of uniform convergence; PAC-Bayes; Online learning [Online: sick] | slides | NK SSBD 31 / Guedj SSBD 21 |
F | Dec 9 | Paper reading assignment: proposal due by noon | ||
Sa | Dec 10 | Assignments 4 and 5 due at 11:59pm | ||
W | Dec 14 | Final exam (in person in ICCS 246, handwritten) — 14:00 - 16:30 | ||
W | Dec 21 | Reading assignment due at noon |
The course meets in person in Orchard Commons 3004, with occasional rare exceptions.
Grading scheme: 70% assignments, 30% final.
The lowest assignment grade (not including the project) will be dropped. Assignments should be done in LaTeX – not handwritten or in a word processor. Hand-in on Gradescope, to be described in more detail soon.
One assignment, late in the term, will be a "mini-project" that will involve reading a paper and running a small exploratory experiment, doing a detailed exploration of their assumptions, etc. Suggested papers and details to come later.
The brief idea of the course: when should we expect machine learning algorithms to work? What kinds of assumptions do we need to be able to be able to rigorously prove that they will work?
Definitely covered: PAC learning, VC dimension, Rademacher complexity, concentration inequalities. Probably: PAC-Bayes, analysis of kernel methods, margin bounds, stability. Maybe: limitations of uniform convergence, analyzing deep nets via neural tangent kernels, provable gaps between kernel methods and deep learning, online learning, feasibility of private learning, compression-based bounds.
There are no formal prerequisites. I will roughly assume:
Books where we'll use sections in some detail:
Some other points of view you might like:
If you need to refresh your linear algebra or other areas of math:
Measure-theoretic probability is not required for this course, but there are instances and related areas where it could be helpful:
Similar courses: