CPSC 340 - Machine Learning and Data Mining (Fall 2017)
Lectures (beginning September 6): Mondays, Wednesdays, and Fridays 4-5 (Forest Sciences Centre 1005).
Instructor: Mark Schmidt.
Instructor office hours: Tuesdays at 3-4pm (ICICS 146).
Tutorials (beginning September 11):
- Mondays from 5-6 (DMP 101).
- Tuesdays from 3:30-4:30 and 4:30-5:30 (DMP 201).
- Wednesdays from 9-10 and 10-11 (DMP 201).
Teaching Assistants: Clement Fung, Hashemi Hooman, Siyuan He, Tanner Johnson, Angad Kalra, Aaron Mishkin, Xin Bei She, Sharan Vaswani, Nasim Zolaktaf, Zainab Zolaktaf
TA office hours (all in Demco Learning Centre):
- Mondays 1-2 (Siyuan at Table 3).
- Tuesdays 2-3 (Aaron at Table 1).
- Wednesdays 2-3 (Hooman at Table 2).
- Thursdays 2-3 (Clement at Table 4, with Aaron on weeks when assignments are due).
- Fridays 10-11 (Angad at Table 2).
Synopsis: We introduce basic principles and techniques in the fields of data mining and machine learning. These are some of the key tools behind the emerging field of data science and the popularity of the `big data' buzzword. These techniques are now running behind the scenes to discover patterns and make predictions in various applications in our daily lives. We'll focus on many of the core data mining and machine learning technlogies, with motivating applications from a variety of disciplines.
Registration: Undergraduate and graduate students from any department are welcome to take the class. However, due to the high demand only UBC computer science majors can directly register for the course. For all other students, to enroll in the course you need to sign up for the wait list (before September 14). Note that last year all students on the wait list were ultimately accepted into the course (but we did not have room for auditors.)
Prerequisites:
- Basic algorithms and data structures (CPSC 221, or both of CPSC 260 and EECE 320 as well as one of CPSC 210, EECE 201, or EECE 309).
- Linear algebra (one of MATH 152, 221, or 223).
- Probability (one of STAT 200, STAT 203, STAT 241, STAT 251, STAT 302, MATH 302, MATH 318, or BIOL 300).
- Multivariate calculus (one of MATH 200, 217, 226, 253, or 263).
Graduates students may receive a warning about prerequisites when registering and may need to follow additional steps described here.
Textbook: There is no required textbook for the class. A introductory book that covers many (but not all) the topics we will discuss is the Artificial Intelligence book of Rusell and Norvig (AI:AMA) or the Artificial Intelligence book of Poole and Mackworth (you may need these for other classes). More advanced books include The Elements of Statistical Learning (ESL) by Hastie et al., Murphy's Machine Learning: A Probabilistic Perspective (ML:APP) which can be accessed through the library here, and Bishop's Pattern Recognition and Machine Learning (PRML). For books with a bigger focus on data mining, see Introduction to Data Mining (IDM) and Mining of Massive DataSets.
Related Courses:
Related courses in statistics include: STAT 305, STAT 306, STAT 406, STAT 460, STAT 461 (as well as EOSC 510). A discussion of the difference between CPSC 340 and these various STAT classes written by a former student (Geoff Roeder) is available here.
Grading: Assignments 30%, Midterm 20%, Final 50%.
Piazza for course-related questions.
List of topics
We will roughly cover the following topics:
- Data representation and summarization.
- Supervised learning with frequencies and distances.
- Data clustering, outlier detection, and association rules.
- Linear prediction, regularization, and kernels.
- Latent-factor models and collaborative filtering.
- Neural networks and deep learning.
Timetable
Date |
Slides |
Related Readings and Links |
Homework and Notes |
Wed Sep 6 |
Motivation and Syllabus
| What is Machine Learning? Machine Learning Rise of the Machines Talking Machine Episode 1
| Assignment 0 a0.zip a0.tex
|
Fri Sep 8 |
Exploratory Data Analysis
| Gotta Catch'em all Why Not to Trust Statistics
Visualization Types Google Chart Gallery
Other Tools
|
|
Mon Sep 11 |
Decision Trees
| A Visual Introduction to Machine Learning,
Decision Trees
AI:AMA 18.2-3, ESL: 9.2, ML:APP 16.2
|
Big-O Notes
Julia Commands
|
Wed Sep 13 |
Fundamentals of Learning
|
7 Steps of Machine Learning
IID Cross-validation Bias-variance
No Free Lunch
AI: AMA 18.4-5, ESL 7.1-7.4, 7.10, ML:APP 1.4, 6.5 |
Course Notation Guide
Tutorial 1
|
Fri Sep 15 |
Probabilistic Classifiers
|
Conditional probability (demo)
Naive Bayes
ESL 4.3, ML: APP 2.2, 3.5, 4.1-4.2
|
Assignment 0 due
Probability Notes
Probability Slides
|
Mon Sep 18 |
Non-Parametric Models
|
K-nearest neighbours
Decision Theory for Darts
Norms
AI: AMA 18.8, ESL 13.3, ML:APP 1.4 |
Assignment 1 a1.zip a1.tex |
Wed Sep 20 |
Ensemble Methods
|
Ensemble Methods
Random Forests
Empirical Study
Kinect
AI: AMA 18.10, ESL: 7.11, 8.2, 15, 16.3, ML: APP 6.2.1, 16.2.5, 16.6
| Tutorial 2
|
Fri Sep 22 |
Clustering
|
Clustering
K-means clustering (demo)
K-Means++ (demo)
IDM 8.1-8.2, ESL: 14.3
|
|
Mon Sep 25
|
Density-based Clustering |
DBSCAN
(video,
demo)
IDM 8.4
|
Tutorial 3 |
Wed Sep 27
|
Hierarchical Clustering |
Hierarchical Clustering Phylogenetic Trees IDM 8.3, ESL 14.3.12, ML:APP 25.5 |
|
Fri Sep 29
|
Finding Similar Items |
MMD Chapter 3 |
Assignment 1 due |
Mon Oct 2
|
Least Squares |
Linear Regression
(demo,
2D data, 2D video)
Least Squares
ESL 3.1-2, ML:APP 7.1-3, AI:AMA 18.6, Essence of Calculus
|
Assignment 2 a2.zip a2.tex |
Wed Oct 4
|
Normal Equations |
Why should one learn machine learning from scratch?
Essence of Linear Algebra
Convex Functions
| Tutorial 4
Linear Algebra Notes
Linear/Quadratic Gradients |
Fri Oct 6
|
Numerical Optimization |
| |
Wed Oct 11
|
Gradient Descent |
Gradient Descent
ML:APP 7.4 |
Tutorial 5 |
Fri Oct 13
|
Nonlinear Regression |
Fluid Simulation
ESL 5.1, 6.3
|
Assignment 2 due |
Mon Oct 16
|
Feature Selection |
Genome-Wide Association Studies
AIC, BIC
ESL 3.3 , 7.5-7
|
|
Wed Oct 18
|
Regularization |
ESL 3.4., ML:APP 7.5, AI:AMA 18.4
|
|
Fri Oct 20
|
Midterm |
|
|
Mon Oct 23
|
More Regularization |
RBF video RBF and Regularization video
ESL 6.7, ML:APP 13.3-4
|
Assignment 3 a3.zip a3.tex |
Wed Oct 25
|
Linear Classifiers |
Perceptron
ESL 4.5, ML:APP 8.5
|
Tutorial 6
|
Fri Oct 27
|
More Linear Classifiers |
Support Vector Machines
ESL 4.4, 12.1-2, ML:APP 8.1-3, 14.5, AI:AMA 18.9
|
|
Mon Oct 30
|
Kernel Trick |
ESL 12.3, ML:APP 14.1-4
|
Assignment 4 a4.zip a4.tex
|
Wed Nov 1
|
Stochastic Gradient |
Stochastic Gradient ML:APP 8.5
|
|
Fri Nov 3
|
Multi-Class Classification |
ESL 4.4, ML:APP 8.3.7, 9.5
|
Assignment 3 due
|
Mon Nov 6
|
MLE and MAP |
Maximum Likelihood Estimation
ML:APP 9.3-4
|
Max and Argmax Notes
|
Wed Nov 8
|
Principal Component Analysis |
Principal Component Analysis ESL 14.5, IDM B.1, ML:APP 12.2
|
Tutorial 8
|
Fri Nov 10
|
More PCA |
Making Sense of PCA
SVD Eigenfaces
|
|
Wed Nov 15
|
Sparse Matrix Factorization |
Non-Negative Matrix Factorization ESL 14.6, ML: APP 13.8
|
Assignment 5 a5.zip a5.tex
|
Fri Nov 17
|
Recommender Systems |
Recommender Systems Netflix Prize |
Assignment 4 due
|
Mon Nov 20
|
Multi-Dimensional Scaling |
Nonlinear Dimensionality Reduction ESL 14.8-9, IDM B.2 |
|
Wed Nov 22
|
Deep Learning |
Google Video What is a Neural Network? Interactive Guide ML:APP 16.5, ESL 11.1-4, AI: AMA 18.7 |
Tutorial 9
|
Fri Nov 24
|
More Deep Learning |
Fortune Article Deep Learning References ML:APP 28.3, ESL 11.5 |
|
Mon Nov 27
|
Convolutional Neural Networks |
Convolutional Neural Networks AlexNet ML:APP 28.4, ESL 11.7
|
Assignment 5 due
|
Wed Nov 29
|
More CNNs |
|
|
Fri Dec 1
|
Guest Lecture: Siamak Ravanbakhsh |
|
|
Related courses that have online notes
Mark Schmidt > Courses > CPSC 340