Lectures    ·    Assignments    ·     Course Schedule    ·   Piazza    ·   Canvas


CPSC 340 Machine Learning and Data Mining

Summer 2021

We introduce basic principles and techniques in the fields of data mining and machine learning. These are some of the key tools behind the emerging field of data science and the popularity of the "big data" buzzword. These techniques are now running behind the scenes to discover patterns and make predictions in various applications in our daily lives. We'll focus on many of the core data mining and machine learning technologies, with motivating applications from a variety of disciplines.

Announcements

This Semester

Instructor

Nam Hee Gordon Kim

Lectures Sections (beginning May 10, 2021)

Office hours

Tutorials (beginning Monday, May 10, 2021)

Teaching assistants

Austin Beauchamp
Peyman Gholami
Lironne Kurzman
Farnoosh Hashemi
Gabriel Huang
Shahriar Shayesteh
Mohamad Amin Mohamadi
Ali Seyfi
Frank Yu
Yuxin Tian

Services

Course Calendar

Registration

Undergraduate and graduate students from any department are welcome to take the course. Undergraduate students should enroll in CPSC 340 while graduate students should enroll in CPSC 532M (which has an extra small project component -- not offered in summers). Below are more details on registration for each course:

Starting in the first week of classes, we will have weekly tutorials run by the TAs. These will do things like go through provided assignment code, review background material, review big concepts, and/or do exercises. You can register for particular tutorial sections if you want to save a seat at a particular time, but note that you do not need to register in a tutorial section.

Prerequisites

Undergraduate and graduate students from any department are welcome to take the class, provided that they satisfy the prerequisites. If you do not satisfy the exact prerequisites but would still like to enroll in the class, see here. For graduate students from outside the CS department, see here.

Textbook

There is no required textbook for the class. A introductory book that covers many (but not all) the topics we will discuss is the Artificial Intelligence book of Rusell and Norvig (AI:AMA) or the Artificial Intelligence book of Poole and Mackworth (you may need these for other classes). More advanced books include The Elements of Statistical Learning (ESL) by Hastie et al., Murphy's Machine Learning: A Probabilistic Perspective (ML:APP) which can be accessed through the library here, and Bishop's Pattern Recognition and Machine Learning (PRML). For books with a bigger focus on data mining, see Introduction to Data Mining (IDM) and Mining of Massive DataSets.

Related Courses

Related courses in statistics include: STAT 305, STAT 306, STAT 406, STAT 460, STAT 461 (as well as EOSC 510). A discussion of the difference between CPSC 340 and these various STAT classes written by a former student (Geoff Roeder) is available here.

Grading

List of topics

We will roughly cover the following topics:

Lectures

Date Slides Related Readings and Links Notes Notebooks
Mon May 10 Motivation and Syllabus (UPDATED) What is Machine Learning?  ·  Machine Learning on Wikipedia  ·  Rise of the Machines  ·  Artificial Intelligence-The Revolution Hasn't Happened Yet  ·  Machine Learning: The Great Stagnation  ·  Stop Calling Everything AI, Machine-Learning Pioneer Says  · 
Exploratory Data Analysis (UPDATED) Gotta Catch'em all  ·  Why Not to Trust Statistics  ·  A critique of pure learning and what artificial neural networks can learn from animal brains  ·  Visualization Types  ·  Google Chart Gallery  ·  Other Tools See assignment 1 below. EDA
Wed May 12 Decision Trees (UPDATED) A Visual Introduction to Machine Learning  ·  Decision Trees  ·  Entropy
AI:AMA 18.2-3, ESL: 9.2, ML:APP 16.2
Big-O Notes
Fundamentals of Learning (UPDATED) 7 Steps of Machine Learning ·  IID  ·  Cross-validation  ·  Bias-variance  ·  No Free Lunch
AI: AMA 18.4-5, ESL 7.1-7.4, 7.10, ML:APP 1.4, 6.5
Course Notation Guide
Fri May 14 Probabilistic Classifiers (UPDATED) Conditional probability (demo)  ·  Naive Bayes  ·  Probabilities and Battleship
ESL 4.3, ML: APP 2.2, 3.5, 4.1-4.2
Probability Notes Probability Slides
Non-Parametric Models (UPDATED) K-nearest neighbours  ·  Decision Theory for Darts  ·  Norms
AI: AMA 18.8, ESL 13.3, ML:APP 1.4
Mon May 17 Data Augmentation and Ensemble Methods (UPDATED) Ensemble Methods  ·  Random Forests  ·  Empirical Study  ·  Kinect  ·  Data Augmentation
AI: AMA 18.10, ESL: 7.11, 8.2, 15, 16.3, ML: APP 6.2.1, 16.2.5, 16.6
Clustering (UPDATED) Clustering  ·  K-means clustering (demo)  ·  K-Means++ (demo)
IDM 8.1-8.2, ESL: 14.3
Wed May 19
More Clustering (UPDATED) DBSCAN (video demo)  ·  Hierarchical Clustering  ·  Phylogenetic Trees
IDM 8.4
Outlier Detection (UPDATED) Empirical Study
IDM 8.3, ESL 14.3.12, ML:APP 25.5
Fri May 21 Least Squares (UPDATED) Linear Regression  ·  (demo, 2D data, 2D video)  ·  Least Squares Essence of Calculus  ·  Partial Derivative  ·  Gradient
ESL 3.1-2, ML:APP 7.1-3, AI:AMA 18.6
Calculus Notes Linear Algebra Notes Linear/Quadratic Gradients
Gradient Descent (UPDATED) Gradient Descent  · 
Wed May 26 Convex Functions and Robust Regression (UPDATED) Convex Functions  ·  ML:APP 7.4
Bonus Lecture (UPDATED) MMD Chapter 3
Fri May 28
Feature Selection (UPDATED) Genome-Wide Association Studies ·  AIC ·  BIC
ESL 3.3, 7.5-7
Regularization (UPDATED) ESL 3.4., ML:APP 7.5, AI:AMA 18.4
Mon May 30
Gaussian RBF (UPDATED) RBF video  ·  RBF and Regularization video
ESL 6.7, ML:APP 13.3-4
Linear Classifiers (UPDATED) Perceptron ESL 4.5, ML:APP 8.5 Support Vector Machines
ESL 4.4, 12.1-2, ML:APP 8.1-3, 9.5 14.5, AI:AMA 18.9
Tue Jun 1
Midterm Exam
Wed Jun 2
Multi-Class Linear Classifiers (UPDATED) ML:APP 8.3.7 and 9.3-5, ESL 4.4
Feature Engineering (UPDATED) Gmail Priority Inbox
Fri Jun 4
Text and Image Data (UPDATED) Convolution GIF
Convolution Interactive Demo
ESL 12.3, ML:APP 14.1-4
Kernel Trick (UPDATED) ESL 12.3, ML:APP 14.1-4
Mon Jun 7
Stochastic Gradient (UPDATED) Stochastic Gradient
ML:APP 8.5
MLE and MAP (UPDATED) Maximum likelihood estimation
ML:APP 9.3-4
Max and Argmax Notes
Wed Jun 9
Principal Component Analysis (UPDATED) Principal Component Analysis
ESL 14.5, IDM B.1, ML:APP 12.2
More PCA (UPDATED) Making Sense of PCA
Singular Value Decomposition
Eigenface
Fri Jun 11
Sparse Matrix Factorization (UPDATED) Non-Negative Matrix Factorization
NMF Paper
ESL 14.6, ML: APP 13.8
Recommender Systems (UPDATED) Recommender Systems
Netflix Prize
Mon Jun 14
Multi-Dimensional Scaling (UPDATED) Nonlinear Dimensionality Reduction
t-SNE Demo
ESL 14.8-9, IDM B.2
Deep Learning (UPDATED) Google Video
What is a Neural Network?
Interactive Guide ML:APP 16.5, ESL 11.1-4, AI: AMA 18.7
The FINAL will be from the material above.
Wed Jun 16
More Deep Learning (UPDATED) Backpropagation
Fortune Article
Deep Learning References
Alchemy
ML:APP 28.3, ESL 11.5
Convolutional Neural Networks (UPDATED) Convolutional Neural Network
ML:APP 28.4, ESL 11.7

The acroynms in the table above refer to the following textbooks:

Related courses that have online notes


Homework Assignments

Post Date Due Date Files Notes/Links
Mon May 10 Mon May 17 a1.pdf  ·  a1.zip (contains the LaTeX template + code)
Setting up Python
Sun May 16 Mon May 24 a2.pdf  ·  a2.zip (contains the LaTeX template + code)
Thu May 20 Fri May 28 a3.pdf  ·  a3.zip (contains the LaTeX template + code)
Thu May 27 Mon Jun 7 a4.pdf  ·  a4.zip (contains the LaTeX template + code)
Thu Jun 3 Fri Jun 11 a5.pdf  ·  a5.zip (contains the LaTeX template + code)
Sat Jun 12 Fri Jun 18 a6.pdf  ·  a6.zip (contains the LaTeX template + code)
Mon Jun 21 Wed Jun 30 a7.pdf  ·  a7.zip (contains the LaTeX template + code)

Previous Offerings