Publications by Mark Schmidt

2024

BlockLLM: Memory-Efficient Adaptation of LLMs by Selecting and Optimizing the Right Coordinate Blocks.
A. Varshini Ramesh, V. Ganapathiraman, I. Laradji, M. Schmidt. arXiv, 2024. [pdf]
Why Line Search when you can Plane Search? SO-Friendly Neural Networks allow Per-Iteration Optimization of Learning and Momentum Rates for Every Layer.
B. Shea, M. Schmdit. arXiv, 2024. [pdf]
Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation.
A. Mishkin, M. Pilanci, M. Schmidt. arXiv, 2024. [pdf]
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models.
F. Kunstner, R. Yadav, A. Milligan, M. Schmidt, A. Bietti. arXiv, 2024. [pdf]

2023

Searching for Optimal Per-Coordinate Step-sizes with Multidimensional Backtracking.
F. Kunstner, V. Portella, M. Schmidt, N. Harvey. NeurIPS, 2023. [pdf] [code]
Don't be so Monotone: Relaxing Stochastic Line Search in Over-Parameterized Models.
L. Galli, H. Rahut, M. Schmidt. NeurIPS, 2023. [pdf]
BiSLS/SPS: Auto-tune Step Sizes for Stable Bi-level Optimization.
C. Fan, G. Choné-Ducasse, M. Schmidt, C. Thrampoulidis. NeurIPS, 2023. [pdf]
Greedy Newton: Newton's Method with Exact Line Search.
B.Shea, M. Schmidt. NeurIPS OPT, 2023. [pdf]
MSL: An Adaptive Momentem-based Stochastic Line-search Framework.
C.Fan, S. Vaswani, C. Thrampoulidis, M. Schmidt. NeurIPS OPT, 2023. [pdf]
Variance Reduced Model Based Methods: New rates and adaptive step sizes.
R. Gower, F. Kunstner, M. Schmidt. NeurIPS OPT, 2023. [pdf]
Analyzing and Improving Greedy 2-Coordinate Updates for Equality-Constrained Optimization via Steepest Descent in the 1-Norm.
A. Varshini Ramesh, A. Mishkin, M. Schmidt, Y. Zhou, J. Lavington, J. She. arXiv, 2023. [pdf] [slides]
Target-based Surrogates for Stochastic Optimization.
J. Lavington, S. Vaswani, R. Babanezhad, M. Schmidt, N. Le Roux. ICML, 2023. [pdf]
Simplifying Momentum-based Positive-definite Submanifold Optimization with Applications to Deep Learning.
W. Lin, V. Duruisseaux, M. Leok, F. Nielsen, M. Khan, M. Schmidt. ICML, 2023. [pdf]
Noise is not the main factor behind the gap between SGD and Adam on transformers, but sign descent might be.
F. Kunstner, J. Chen, J. Lavington, M. Schmidt. ICLR, 2023. [pdf] [code]
Fast Convergence of Random Reshuffling under Over-Parameterization and the Polyak-Lojasiewicz Condition.
C. Fan, C. Thrampoulidis, M. Schmidt. ECML, 2023. [pdf]
Optimistic Thompson Sampling-based Algorithms for Episodic Reinforcement Learning.
B. Hu, T. Zhang, N. Hegde, M. Schmidt. UAI, 2023. [pdf] [poster] [code]
Predicting DNA kinetics with a truncated continuous-time Markov chain method.
S. Zolaktaf, F. Dannenberg, M. Schmidt, A. Condon, E. Winfree. Comp Bio & Chem, 2023. [pdf]

2022

Let's Make Block Coordinate Descent Converge Faster: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence.
J. Nutini, I. Laradji, M. Schmidt. JMLR, 2022 (submitted 2017). [pdf] [poster] [slides (short)] [slides (long)] [talk] [code]
Improved Policy Optimization for Online Imitation Learning.
J.W. Lavington, S. Vaswani, M. Schmidt. CoLLAs, 2022. [pdf]
SVRG Meet AdaGrad: Painless Variance Reduction.
B. Dubois-Taine, S. Vaswani, R. Babanezhad, M. Schmidt, S. LaCoste-Julien, MLJ, 2022. [pdf] [code]

2021

Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent.
F. Kunstner, R. Kumar, M. Schmidt, AISTATS, 2021 (Best Paper Award). [pdf] [slides]
Robust Asymmetric Learning in POMDPs.
A. Warrington, J. Lavington, A. Scibor, M. Schmidt, F. Wood, ICML, 2021. [pdf]
Tractable structured natural gradient descent using local parameterizations.
W. Lin, F. Nielsen, M. Khan, M. Schmidt, ICML, 2021. [pdf]
AutoRetouch: Automatic Professional Face Retouching.
A. Shafaei, J. Little, M. Schmidt, WACV, 2021. [pdf] [supplemental] [FFHQR Dataset] [video]
Faster Quasi-Newton Methods for Linear Composition Problems.
B. Shea, M. Schmidt, NeurIPS OPT, 2021. [pdf]
An Empirical Study of Non-Uniform Sampling in Off-Policy Reinforcement Learning for Continuous Control.
N. Ioannidis, J. Lavington, M. Schmidt. NeurIPS Deep RL, 2021. [pdf]
A Closer Look at Gradient Estimators with Reinforcement Learning as Inference.
J. Lavington, M. Teng, M. Schmidt, F. Wood. NeurIPS Deep RL, 2021. [pdf]

2020

Adaptive Gradient Methods Converge Faster with Over-Parameterization (but you should do a line-search).
S. Vaswani, F. Kunstner, I. Laradji, S.Y. Meng, M. Schmidt, S. LaCoste-Julien, arXiv, 2020. [pdf]
Variance-Reduced Methods for Machine Learning.
R. Gower, M. Schmidt, F. Bach, P. Richtarik, Proc IEEE, 2020. [pdf]
Regret Bounds without Lipschitz Continuity: Online Learning with Relative-Lipschitz Losses.
Y. Zhou, V. Portella, M. Schmidt, N. Harvey, NeurIPS, 2020. [pdf]
Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation.
S.Y. Meng, S. Vaswani, I. Laradji, M. Schmidt, S. Lacoste-Julien. AISTATS, 2020. [pdf] [slides] [video] [code]
Combining Bayesian Optimization and Lipschitz Optimization.
M. Ahmed, S. Vaswani, M. Schmidt. MLJ, 2020. [pdf] [slides]
Handling the Positive-Definite Constraint in the Bayesian Learning Rule.
W. Lin, M. Schmidt, M. Khan. ICML, 2020. [pdf] [slides] [video] [code]
Instance Segmentation with Point Supervision.
I. Laradji, N. Rostamzadeh, P. Pinheiro, D. Vazquez, M. Schmidt. ICIP, 2020. [pdf]
A Multiagent Model of Efficient and Sustainable Financial Markets.
B. Shea, M. Schmidt, M. Kamgarpour. NeurIPS ML for Economic Policy, 2020. [pdf]

2019

Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates.
S. Vaswani, A. Mishkin, I. Laradji, M. Schmidt, G. Gidel, S. LaCoste-Julien. NeurIPS, 2019. [pdf] [poster] [slides] [video] [code]
"Active-set complexity" of proximal-gradient: How long does it take to find the sparsity pattern?.
J. Nutini, M. Schmidt, W. Hare. OPTL, 2019. [pdf] [poster] [slides]
A Less Biased Evaluation of Out-of-distribution Sample Detectors (Formerly "Does Your Model Know the Digit 6 is Not a Cat?").
A. Shafaei, M. Schmidt, J. Little. BMVC, 2019. [pdf] [poster] [code] [video]
Where are the Masks: Instance Segmentation with Image-Level Supervision.
I. Laradji, D. Vazquez, M. Schmidt. BMVC, 2019. [pdf] [poster]
Fast and Faster Convergence of SGD for Over-Parameterized Models (and an Accelerated Percepton).
S. Vaswani, F. Bach, M. Schmidt. AISTATS, 2019. [pdf] [poster]
Are we there yet? Manifold identification of gradient-related proximal methods.
Y. Sun, H. Jeong, J. Nutini, M. Schmidt. AISTATS, 2019. [pdf] [poster]
Distributed Maximization of Submodular plus Diversity Functions for Multi-label Feature Selection on Huge Datasets.
M. Ghadiri, M. Schmidt. AISTATS, 2019. [pdf] [poster]
Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family Approximations.
W. Lin, M. Khan, M. Schmidt. ICML, 2019. [pdf] [poster] [code]
Efficient Parameter Estimation for DNA Kinetics Modeled as Continuous-Time Markov Chains.
S. Zolaktaf, F. Dannenberg, E. Winfree, A. Bouchard-Cote, M. Schmidt, A. Condon. DNA, 2019. [pdf] [slides] [code]
Efficient Deep Gaussian Process Models for Variable-Sized Input.
I. Laradji, M. Schmidt, V. Pavlovic, M. Kim. IJCNN, 2019. [pdf] [poster] [code]
Newton-Laplace Updates for Block Coordinate Descent.
S.Y. Meng, M. Schmidt. NeurIPS WiML, 2019. [pdf] [poster]

2018

Where are the Blobs: Counting by Localization with Point Supervision.
I. Laradji, N. Rostamzadeh, P. Pinheiro, D. Vazquez, M. Schmidt. ECCV, 2018. [pdf] [poster] [video 1] [video 2] [code]
Online Learning Rate Adaptation with Hypergradient Descent.
A. Baydin, R. Cornish, D. Rubio, M. Schmidt, F. Wood. ICLR, 2018. [pdf][poster][code]
SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient.
A. Mishkin, F. Kunstner, D. Nielson, M. Khan, M. Schmidt. NeurIPS, 2018. [pdf] [poster] [video] [code]
MASAGA: A Linearly-Convergent Stochastic First-Order Method for Optimization on Manifolds.
R. Babanezhad, I. Laradji, A. Shafaei, M. Schmidt. ECML, 2018. [pdf] [code]
New Insights into Bootstrapping for Bandits.
S. Vaswani, B. Kveton, Z. Wen, A. Rao, M. Schmidt, Y. Abbasi-Yadkori. arXiv, 2018. [pdf]

2017

Minimizing Finite Sums with the Stochastic Average Gradient.
M. Schmidt, N. Le Roux, F. Bach. MAPR, 2017 (submitted 2013) (2018 Lagrange Prize in Continuous Optimization). [pdf] [slides] [proof scripts] [talk] [code]
Model-Independent Online Learning for Influence Maximization.
S. Vaswani, B. Kveton, Z. Wen, M. Ghavamzadeh, L. Lakshmanan, M. Schmidt. ICML, 2017. [pdf] [poster] [slides] [code]
Horde of Bandits using Gaussian Markov Random Fields.
S. Vaswani, M. Schmidt, L. Lakshmanan. AISTATS, 2017. [pdf] [poster] [slides]
Inferring Parameters for an Elementary Step Model of DNA Structure Kinetics with Locally Context-Dependent Arrhenius Rates.
S. Zolaktaf, F. Dannenberg, X. Rudelis, A. Condon, J. Schaeffer, M. Schmidt, C. Thachuk, E. Winfree. DNA, 2017 (Best Student Paper Award). [pdf] [slides] [code]

2016

Linear Convergence of Gradient and Proximal-Gradient Methods under the Polyak-Lojasiewicz Condition.
H. Karimi, J. Nutini, M. Schmidt. ECML, 2016. [pdf] [poster] [slides] [addendum]
Play and Learn: Using Video Games to Train Computer Vision Models.
A. Shafaei, J. Little, M. Schmidt. BMVC, 2016. [pdf] [poster] [slides] [MIT Technology Review]
Convergence Rates for Greedy Kaczmarz Algorithms, and Faster Randomized Kaczmarz Rules Using the Orthogonality Graph.
J. Nutini, B. Sepehry, I. Laradji, M. Schmidt, H. Koepke, A. Virani. UAI, 2016. [pdf] [poster] [code]
Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions.
M. Khan, R. Babanezhad, W. Lin, M. Schmidt, M. Sugiyama. UAI, 2016. [pdf] [poster] [code]
Do we need "Harmless" Bayesian Optimization and "First-Order" Bayesian Optimization?
M.O. Ahmed, B. Shahriari, M. Schmidt. NeurIPS BayesOPT, 2016. [pdf] [poster] [slides]
Fast Patch-based Style Transfer of Arbitrary Styles.
T.Q. Chen, M. Schmidt. NeurIPS Constructive ML, 2016. [pdf] [poster] [slides] [code/videos]

2015

Coordinate descent converges faster with the Gauss-Southwell rule than random selection.
J. Nutini, M. Schmidt, I. Laradji, M. Friedlander, H. Koepke. ICML, 2015. [pdf] [poster] [slides] [talk] [code]
Stop Wasting My Gradients: Practical SVRG.
R. Babanezhad, M.O. Ahmed, A. Virani, M. Schmidt, J. Konecny, S. Sallinen. NeurIPS, 2015. [pdf] [poster] [slides] [code]
Non-Uniform Stochastic Average Gradient Method for Training Conditional Random Fields.
M. Schmidt, R. Babanezhad, M.O. Ahmed, A. Defazio, A. Clifton, A. Sarkar. AISTATS, 2015. [pdf] [poster] [slides] [code]
Influence Maximization with Bandits.
S. Vaswani, L. Lakshmanan, M. Schmidt. NeurIPS Networks, 2015. [pdf] [poster]
Hierarchical Maximum-Margin Clustering.
G.-T. Zhou, S.J. Hwang, M. Schmidt, L. Sigal, G. Mori. arXiv, 2015. [pdf]

2014

Convex Optimization for Big Data: Scalable, randomized, and parallel algorithms for big data analytics.
V. Cevher, S. Becker. M. Schmidt. IEEE SPM, 2014. [pdf]
Convergence Rate of Stochastic Gradient with Constant Step Size.
M. Schmidt. UBC Technical Report, 2014. [pdf]

2013

Block-Coordinate Frank-Wolfe Optimization for Structural SVMs.
S. Lacoste-Julien, M. Jaggi, M. Schmidt, P. Pletscher. ICML, 2013. [pdf] [poster] [slides] [code]
Fast Convergence of Stochastic Gradient Descent under a Strong Growth Condition.
M. Schmidt, N. Le Roux. arXiv, 2013. [pdf]

2012

A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets.
N. Le Roux, M. Schmidt, F. Bach. NeurIPS, 2012 (2018 Lagrange Prize in Continuous Optimization). [pdf] [poster] [slides] [talk] [code] [extended version]
Hybrid Deterministic-Stochastic Methods for Data Fitting.
M. Friedlander, M. Schmidt. SISC, 2012. [pdf] [slides] [code] [addendum]
A simpler approach to obtaining an O(1/t) convergence rate for projected stochastic subgradient descent.
S. Lacoste-Julien, M. Schmidt, F. Bach. arXiv, 2012. [pdf] [code]
On Sparse, Spectral and Other Parameterizations of Binary Probabilistic Models.
D. Buchmann, M. Schmidt, S. Mohamed, D. Poole, N. de Freitas. AISTATS, 2012. [pdf] [poster] [code].

2011

Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization.
M. Schmidt, N. Le Roux, F. Bach. NeurIPS, 2011. [pdf] [poster] [slides] [talk] [code]
Projected Newton-type Methods in Machine Learning.
M. Schmidt, D. Kim, S. Sra. Optimization for Machine Learning (S. Sra, S. Nowozin, S.Wright), MIT Press 2011. [pdf] [slides] [talk] [code]
Generalized Fast Approximate Energy Minimization via Graph Cuts: Alpha-Expansion Beta-Shrink Moves.
M. Schmidt, K. Alahari. UAI, 2011. [pdf] [poster] [code]
A hybrid stochastic-deterministic optimization method for waveform inversion.
T. van Leeuwen, M. Schmidt, M. Friedlander, F. Herrmann EAGE, 2011. [pdf] [slides] [code]

2010

Graphical Model Structure Learning with L1-Regularization.
M. Schmidt. PhD Thesis, 2010. [pdf] [slides] [code]
Convex Structure Learning in Log-Linear Models: Beyond Pairwise Potentials.
M. Schmidt, K. Murphy. AISTATS, 2010. [pdf] [slides] [talk] [code]
Modeling annotator expertise: Learning when everybody knows a bit of something.
Y. Yan, R. Rosales, G. Fung, M. Schmidt, G. Hermosillo, L. Bogoni, L. Moy, J. Dy. AISTATS, 2010. [pdf] [talk]
Causal Learning without DAGs.
D. Duvenaud, D. Eaton, K. Murphy, M. Schmidt. JMLR W&CP, 2010. [pdf] [poster] [talk] [code]

2009

Optimizing Costly Functions with Simple Constraints: A Limited-Memory Projected Quasi-Newton Algorithm.
M. Schmidt, E. van den Berg, M. Friedlander, K. Murphy. AISTATS, 2009 (Best Paper Award) [pdf] [slides] [code] [examples]
Group Sparse Priors for Covariance Estimation.
B. Marlin, M. Schmidt, K. Murphy. UAI, 2009. [pdf] [poster]
Modeling Discrete Interventional Data using Directed Cyclic Graphical Models.
M. Schmidt, K. Murphy. UAI, 2009. [pdf] [slides] [code] [addendum]
Increased Discrimination in Level Set Methods with Embedded Conditional Random Fields.
D. Cobzas, M. Schmidt. CVPR, 2009. [pdf] [poster]
Optimization Methods for L1-Regularization.
M. Schmidt, G. Fung, R. Rosales. UBC Technical Report, 2009. [pdf] [code] [examples]

2008

Structure Learning in Random Fields for Heart Motion Abnormality Detection.
M. Schmidt, K. Murphy, G. Fung, R. Rosales. CVPR, 2008. [pdf] [poster] [code] [addendum]
Group Sparsity via Linear-Time Projection.
E. van den Berg, M. Schmidt. M. Friedlander, K. Murphy. UBC Technical Report, 2008. [pdf] [code]
An interior-point stochastic approximation method and an L1-regularized delta rule.
P. Carbonetto, M. Schmidt, N. de Freitas. NeurIPS, 2008. [pdf] [

2007

Fast Optimization Methods for L1-Regularization: A Comparative Study and 2 New Approaches.
M. Schmidt, G. Fung, R. Rosales. ECML, 2007. [pdf] [slides] [talk] [code] [examples] [addendum] [extended version]
Learning Graphical Model Structure using L1-Regularization Paths.
M. Schmidt, A. Niculescu-Mizil, K Murphy. AAAI, 2007. [pdf] [code] [addendum]
3D Variational Brain Tumor Segmentation using a High Dimensional Feature Set.
D. Cobzas, N. Birkbeck, M. Schmidt, M. Jagersand, A. Murtha. MMBIA, 2007. [pdf] [online material]

2006

Accelerated Training of Conditional Random Fields with Stochastic Gradient Methods.
S. Vishwanathan, N. Schraudolph, M. Schmidt, K. Murphy. ICML, 2006. [pdf] [1d code] [2d code] [slides]
A Classification-based Glioma Diffusion Model Using MRI Data.
M. Morris, R. Greiner, J. Sander, A. Murtha, M. Schmidt. CAI, 2006. [pdf]

2005

Segmenting Brain Tumors using Conditional Random Fields and Support Vector Machines.
C.-H. Lee, M. Schmidt, A. Murtha, A. Bistritz, J. Sander, R. Greiner. CVBIA, 2005. [pdf] [poster]
Segmenting Brain Tumors using Alignment-Based Features.
M. Schmidt, I. Levner, R. Greiner, A. Murtha, A. Bistritz. ICMLA, 2005. [pdf]
Support Vector Random Fields for Spatial Classification.
C.-H. Lee, R. Greiner, M. Schmidt. PKDD, 2005. [pdf] [presentation]
Automatic Brain Tumor Segmentation.
M. Schmidt. MSc Thesis, 2005. [pdf]

Notes

Some Notes on Writing (2020)
Convex Optimization Cheat Sheet (2016)
Argmax and Max Calculus (2016)
Randomized Kaczmarz (2015)
Optimization for Machine Learning (2015)
Convergence Rate of Proximal-Gradient with a General Step-Size (2014)
Conditional Random Fields with Latent Variables (2014)
Convex Optimization (2013)
Big-N Problems (2012)
Batching and Shrinking with the Hinge Loss (2012)
Convex Optimization (MLSS Practical Sessions) (2011)
Convergence Rates (2010)
Singular Value Thresholding (2010)
Generalized Interventional Potentials (2010)
Polar Cones (2009)
Linear Algebra (2009)
Structural SVMs (2009)
Lasso Duals (2008)
Projection onto L1,Q-Norm Cones (2008)
LARS-MLE, L1PC (2007)

Student Theses and Essays

Computationally Efficient Geometric Methods for Optimization and Inference in Machine Learning
W. Lin, PhD Thesis, 2023. [pdf]
Optimistic Thompson Sampling: Strategic Exploration in Bandits and Reinforcement Learning
T. Zhang, MSc Thesis, 2023. [pdf]
A Study of the Edge of the Stability in Deep Learning
C. Fox, MSc Thesis, 2023. [pdf]
Algorithm Configuration Landscapes: Analysis and Exploitation
Y. Pushak, PhD Thesis, 2022. (CS-Can/Info-Can Distinguished Dissertation Award) [pdf]
Subspace optimization for machine learning
B. Shea, MSc Thesis, 2021. [pdf]
Pragmatic investigations of applied deep learning in computer vision applications
A. Shafaei, PhD Thesis, 2020. [pdf]
Efficiently estimating kinetics of interacting nucleic acid strands modeled as continuous-time Markov chains
S. Zolaktaf, PhD Thesis, 2020. [pdf]
Regret Bounds without Lipschitz Continuity: Online Learning with Relative-Lipschitz Losses
Y. Zhou, MSc Thesis, 2020. [pdf]
Interpolation, Growth Conditions, and Stochastic Gradient Descent
A. Mishkin, MSc Thesis, 2020. [pdf]
Stochastic Second-Order Optimization for Over-parameterized Machine Learning Models
S.Y. Meng, MSc Thesis, 2020. [pdf]
Where are the objects? : weakly supervised methods for counting, localization and segmentation
I. Laradji, PhD Thesis, 2020. [pdf]
Investigating the impact of normalizing flows on latent variable machine translation
M. Przystupa, MSc Thesis, 2020. [pdf]
Practical optimization methods for machine learning models
R. Babanezhad, PhD Thesis, 2019. [pdf]
Beyond submodular maximization : one-sided smoothness and meta-submodularity
M. Ghadiri, MSc Thesis, 2019. [pdf]
Structured Bandits and Applications: Exploiting Problem Structure for Better Decision-making under Uncertainty
S. Vaswani, PhD Thesis, 2019. [pdf]
Practical Optimization for Structured Machine Learning Problems
M. Ahmed, PhD Thesis, 2018. [pdf]
Greed is Good: Greedy Optimization Methods for Large-Scale Structured Problems
J. Nutini, PhD Thesis, 2018. (CS-Can/Info-Can Distinguished Dissertation Award) [pdf]
Deep kernel mean embeddings for generative modeling and feedforward style transfer
T.Q. Chen, MSc Thesis, 2017. [pdf]
Finding a Maximum Weight Sequence with Dependency Constraints
B. Sepehry, MSc Essay, 2016. [pdf] [slides]

Mark Schmidt > Publications