CPSC 340 - Machine Learning and Data Mining (Fall 2016)
Lectures: Mondays, Wednesdays, and Fridays (2-3 in West Mall Swing Space 122) beginning September 7
Tutorials: Mondays from 4-5 (MacLeod 214) and 5-6 (DMP 101), Tuesdays from 4:30-5:30 (DMP 201), and Wednesdays from 9-10 (CBEB 103) beginning September 12.
Office hours: Tuesdays at 2-3 (ICICS 104) and 3:30-4:30 (DLC Table 4), Wednesdays 4-5 (ICICS X337), Thursdays 4:30-5:30 (ICICS X836), or by appointment.
Instructor: Mark Schmidt
Teaching Assistants: Reza Babanezhad, Ricky Chen, Issam Laradji, Robbie Rolin, Alireza Shafaei, Moumita Roy Tora, Nasim Zolaktaf, Zainab Zolaktaf
Synopsis: We introduce basic principles and techniques in the fields of data mining and machine learning. These are some of the key tools behind the emerging field of data science and the popularity of the `big data' buzzword. These techniques are now running behind the scenes to discover patterns and make predictions in various applications in our daily lives. We'll focus on many of the core data mining and machine learning technlogies, with motivating applications from a variety of disciplines.
Prerequisites:
- Basic algorithms and data structures (CPSC 221, or both of CPSC 260 and EECE 320 as well as one of CPSC 210, EECE 201, or EECE 309).
- Linear algebra (one of MATH 152, 221, or 223).
- Probability (one of STAT 200, STAT 203, STAT 241, STAT 251, STAT 302, MATH 302, MATH 318, or BIOL 300).
- Multivariate calculus (one of MATH 200, 217, 226, 253, or 263).
Since multivariate calculus is a new prerequisite, for the 2016-17 year only we are allowing MATH 200 (equivalent) to be taken as a co-requisite provided that the average of the other MATH/STAT prerequisites is at least 76%. Other courses that are helpful but not required include scientific computing (CPSC 302), algorithms and complexity (CPSC 320), and statistical inference (STAT 305).
Registration: Undergraduate and graduate students from any department are welcome to take the class, provided that they satisfy the prerequisites. If you do not satisfy the exact prerequisites but would still like to enroll in the class, there are additional details available here and here.
The general seats available in this class usually fill up very quickly. Because of this, we have reserved a small number of restricted seats for CPSC graduate students. These seats will turn into general seats at the end of the first week of class.
Once the general seats are taken, the only way to register for the course is to sign up for the waiting list. You should sign up for the waiting list even if it is long; last year we were able to accommodate all students on the waiting list. Signing up for the waiting list also makes it more likely that we will open up extra sessions, expand class sizes, or offer additional courses on these topics. You may also want to consider taking related courses from statistics: STAT 305, STAT 306, STAT 406, STAT 460, STAT 461 (as well as EOSC 510). A discussion of the difference between CPSC 340 and these various STAT classes written by a former student (Geoff Roeder) is available here.
Textbook: There is no required textbook for the class. A introductory book that covers many (but not all) the topics we will discuss is the Artificial Intelligence book of Rusell and Norvig (AI:AMA) or the Artificial Intelligence book of Poole and Mackworth (you may need these for other classes). More advanced books include The Elements of Statistical Learning (ESL) by Hastie et al., Murphy's Machine Learning: A Probabilistic Perspective (ML:APP) which can be accessed through the library here, and Bishop's Pattern Recognition and Machine Learning (PRML). For books with a bigger focus on data mining, see Introduction to Data Mining (IDM) and Mining of Massive DataSets.
Grading: Assignments 25%, Midterm 30%, Final 45%.
Piazza for course-related questions.
List of topics
We will roughly cover the following topics:
- Data exploration, cleaning, and preprocessing.
- Supervised learning with frequencies and distances.
- Data clustering, outlier detection, and association rules.
- Linear prediction, regularization, and kernels.
- Latent-factor models and collaborative filtering.
- Neural networks and deep learning.
- Density estimation and Markov models.
Timetable
Date |
Topic |
Related Readings and Links |
Homework and Notes |
Wed Sep 7
|
Syllabus
| Machine Learning Rise of the Machines Talking Machine Episode 1 |
|
Fri Sep 9
|
Data Exploration |
Gotta Catch'em all Why Not to Trust Statistics
Visualization Types Google Chart Gallery Matlab demos Other Tools |
|
Mon Sep 12
|
Decision Trees |
A Visual Introduction to Machine Learning,
Decision Trees
Entropy
What make Dr. Seuss so silly?
AI:AMA 18.2-3, ESL: 9.2, ML:APP 16.2 |
Assignment 1 a1.zip
Notes on big-O Getting Started with Matlab |
Wed Sep 14
|
Learning Theory |
IID Cross-validation Bias-variance No Free Lunch
AI: AMA 18.4-5, ESL 7.1-7.4, 7.10, ML:APP 1.4, 6.5 |
Tutorial 1 Matlab Commands |
Fri Sep 16
|
Generative Models |
Conditional probability (demo) Naive Bayes Probabilities and Battleship ESL 4.3, ML: APP 2.2, 3.5, 4.1-4.2 |
Notes on probability |
Mon Sep 19
|
Non-Parametric Models |
K-nearest neighbours Decision Theory for Darts AI: AMA 18.8, ESL 13.3, ML:APP 1.4 |
|
Wed Sep 21
|
Ensemble Methods |
Ensemble Methods Random Forests Empirical Study Kinect AI: AMA 18.10, ESL: 7.11, 8.2, 15, 16.3, ML: APP 6.2.1, 16.2.5, 16.6 |
Tutorial 2 t2.zip |
Fri Sep 23
|
Clustering |
Clustering K-means clustering (demo) K-Means++ (demo)
IDM 8.1-8.2, ESL: 14.3 |
Assignment 1 due |
Mon Sep 26
|
Density-based Clustering |
DBSCAN (video, demo) Norms IDM 8.4 |
Assignment 2 a2.zip |
Wed Sep 28
|
Hierarchical Clustering |
Hierarchical Clustering Phylogenetic Trees IDM 8.3, ESL 14.3.12, ML:APP 25.5 |
Tutorial 3 |
Fri Sep 30
|
Outlier Detection |
Survey and Empirical Study IDM 10.1-5 |
|
Mon Oct 3
|
Association Rules |
Association Rule Learning Apriori Amazon Product Recommendation IDM 6.1-6.3, ESL 14.2 |
|
Wed Oct 5
|
Linear Regression |
Linear Regression (demo, 2D data, 2D video) Least Squares
Partial Derivatives Gradient
ESL 3.1-2, ML:APP 7.1-3, AI:AMA 18.6 |
Tutorial 4 Notes on Linear Algebra |
Fri Oct 7
|
Non-Linear Regression |
Fluid Simulation
ESL 5.1, 6.3, and 6.7 |
Assignment 2 due Linear/Quadratic Gradients |
Wed Oct 12
|
Regularization |
RBF video RBF and Regularization video
ESL 3.4, ML:APP 7.5, AI:AMA 18.4 |
Assignment 3 a3.zip Tutorial 5 |
Fri Oct 14
|
Gradient Descent |
Gradient Descent ML:APP 7.4 |
|
Mon Oct 17
|
Logistic Regression |
Gmail Priority Inbox ESL 4.4, ML:APP 8.1-3, AI:AMA 18.9 |
|
Wed Oct 19
|
Support Vector Machines |
Support Vector Machines ESL 4.5 and 12.1-2, ML:APP 14.5 |
Assignment 3 due Tutorial 6 |
Fri Oct 21
|
Kernel Methods |
ESL 12.3, ML:APP 14.1-4 |
|
Mon Oct 24
|
Stochastic Gradient |
Stochastic Gradient ML:APP 8.5 |
|
Wed Oct 26
|
Feature Selection |
ESL 3.3 |
|
Fri Oct 28
|
Midterm |
|
|
Mon Oct 31
|
L1-Regularization |
Maximum Likelihood Estimation ESL 3.4, ML:APP 13.3-4 |
Assignment 4 a4.zip |
Wed Nov 2
|
Multi-Class Regression |
ML:APP 8.3.7 and 9.3-5, ESL 4.4 |
Tutorial 8 |
Fri Nov 4
|
Principal Component Analysis |
Principal Component Analysis ESL 14.5, IDM B.1, ML:APP 12.2 |
|
Mon Nov 7
|
More PCA |
SVD Eigenfaces |
|
Wed Nov 9
|
Sparse Matrix Factorization |
Non-Negative Matrix Factorization ESL 14.6, ML: APP 13.8 |
Tutorial 9 |
Mon Nov 14
|
Recommender Systems |
Recommender Systems Netflix Prize |
Assignment 5 a5.zip Assignment 4 due |
Wed Nov 16
|
Multi-Dimensional Scaling |
Nonlinear Dimensionality Reduction ESL 14.8-9, IDM B.2 |
Tutorial 10 |
Fri Nov 18
|
Neural Networks |
Google Video Fortune Article ML:APP 16.5, ESL 11.1-4, AI: AMA 18.7 |
Assignment 6 a6.zip |
Mon Nov 21
|
Deep Learning |
Web book ML:APP 28.3, ESL 11.5 |
|
Wed Nov 23
|
Convolutional Neural Networks |
Convolutional Neural Networks AlexNet ML:APP 28.4, ESL 11.7 |
Tutorial 11 |
Fri Nov 25
|
More CNNs |
|
Assignment 5 due |
Mon Nov 28
|
Ranking |
PageRank Slides, PageRank math/code ESL 14.10, ML:APP 9.7, AI: AMA 22.3 |
|
Wed Nov 30
|
Semi-Supervised Learning |
Semi-Supervised Learning Label Propagation at Google |
Tutorial 12 |
Fri Dec 2
|
Course Review/Preview |
|
Assignment 6 due |
Related courses that have online notes
Mark Schmidt > Courses > CPSC 340