Publications by Mark Schmidt
2024
- Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models.
F. Kunstner, R. Yadav, A. Milligan, M. Schmidt, A. Bietti. NeurIPS, 2024.
[pdf]
- BlockLLM: Memory-Efficient Adaptation of LLMs by Selecting and Optimizing the Right Coordinate Blocks.
A. Varshini Ramesh, V. Ganapathiraman, I. Laradji, M. Schmidt. arXiv, 2024.
[pdf]
- Why Line Search when you can Plane Search? SO-Friendly Neural Networks allow Per-Iteration Optimization of Learning and Momentum Rates for Every Layer.
B. Shea, M. Schmdit. arXiv, 2024.
[pdf]
- Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation.
A. Mishkin, M. Pilanci, M. Schmidt. arXiv, 2024.
[pdf]
2023
- Searching for Optimal Per-Coordinate Step-sizes with Multidimensional Backtracking.
F. Kunstner, V. Portella, M. Schmidt, N. Harvey. NeurIPS, 2023.
[pdf]
[code]
- Don't be so Monotone: Relaxing Stochastic Line Search in Over-Parameterized Models.
L. Galli, H. Rahut, M. Schmidt. NeurIPS, 2023.
[pdf]
- BiSLS/SPS: Auto-tune Step Sizes for Stable Bi-level Optimization.
C. Fan, G. Choné-Ducasse, M. Schmidt, C. Thrampoulidis. NeurIPS, 2023.
[pdf]
- Greedy Newton: Newton's Method with Exact Line Search.
B.Shea, M. Schmidt. NeurIPS OPT, 2023.
[pdf]
- MSL: An Adaptive Momentem-based Stochastic Line-search Framework.
C.Fan, S. Vaswani, C. Thrampoulidis, M. Schmidt. NeurIPS OPT, 2023.
[pdf]
- Variance Reduced Model Based Methods: New rates and adaptive step sizes.
R. Gower, F. Kunstner, M. Schmidt. NeurIPS OPT, 2023.
[pdf]
- Analyzing and Improving Greedy 2-Coordinate Updates for Equality-Constrained Optimization via Steepest Descent in the 1-Norm.
A. Varshini Ramesh, A. Mishkin, M. Schmidt, Y. Zhou, J. Lavington, J. She. arXiv, 2023.
[pdf]
[slides]
- Target-based Surrogates for Stochastic Optimization.
J. Lavington, S. Vaswani, R. Babanezhad, M. Schmidt, N. Le Roux. ICML, 2023.
[pdf]
- Simplifying Momentum-based Positive-definite Submanifold Optimization
with Applications to Deep Learning.
W. Lin, V. Duruisseaux, M. Leok, F. Nielsen, M. Khan, M. Schmidt. ICML, 2023.
[pdf]
- Noise is not the main factor behind the gap between SGD and Adam on transformers, but sign descent might be.
F. Kunstner, J. Chen, J. Lavington, M. Schmidt. ICLR, 2023.
[pdf]
[code]
- Fast Convergence of Random Reshuffling under Over-Parameterization and the Polyak-Lojasiewicz Condition.
C. Fan, C. Thrampoulidis, M. Schmidt. ECML, 2023.
[pdf]
- Optimistic Thompson Sampling-based Algorithms for Episodic Reinforcement Learning.
B. Hu, T. Zhang, N. Hegde, M. Schmidt. UAI, 2023.
[pdf]
[poster]
[code]
- Predicting DNA kinetics with a truncated continuous-time Markov chain method.
S. Zolaktaf, F. Dannenberg, M. Schmidt, A. Condon, E. Winfree. Comp Bio & Chem, 2023.
[pdf]
2022
- Let's Make Block Coordinate Descent Converge Faster: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence.
J. Nutini, I. Laradji, M. Schmidt. JMLR, 2022 (submitted 2017).
[pdf]
[poster]
[slides (short)]
[slides (long)]
[talk]
[code]
- Improved Policy Optimization for Online Imitation Learning.
J.W. Lavington, S. Vaswani, M. Schmidt. CoLLAs, 2022.
[pdf]
- SVRG Meet AdaGrad: Painless Variance Reduction.
B. Dubois-Taine, S. Vaswani, R. Babanezhad, M. Schmidt, S. LaCoste-Julien, MLJ, 2022.
[pdf]
[code]
2021
- Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent.
F. Kunstner, R. Kumar, M. Schmidt, AISTATS, 2021 (Best Paper Award).
[pdf]
[slides]
- Robust Asymmetric Learning in POMDPs.
A. Warrington, J. Lavington, A. Scibor, M. Schmidt, F. Wood, ICML, 2021.
[pdf]
- Tractable structured natural gradient descent using local parameterizations.
W. Lin, F. Nielsen, M. Khan, M. Schmidt, ICML, 2021.
[pdf]
- AutoRetouch: Automatic Professional Face Retouching.
A. Shafaei, J. Little, M. Schmidt, WACV, 2021.
[pdf]
[supplemental]
[FFHQR Dataset]
[video]
- Faster Quasi-Newton Methods for Linear Composition Problems.
B. Shea, M. Schmidt, NeurIPS OPT, 2021.
[pdf]
- An Empirical Study of Non-Uniform Sampling in Off-Policy Reinforcement Learning for Continuous Control.
N. Ioannidis, J. Lavington, M. Schmidt. NeurIPS Deep RL, 2021.
[pdf]
- A Closer Look at Gradient Estimators with Reinforcement Learning as Inference.
J. Lavington, M. Teng, M. Schmidt, F. Wood. NeurIPS Deep RL, 2021.
[pdf]
2020
- Adaptive Gradient Methods Converge Faster with Over-Parameterization (but you should do a line-search).
S. Vaswani, F. Kunstner, I. Laradji, S.Y. Meng, M. Schmidt, S. LaCoste-Julien, arXiv, 2020.
[pdf]
- Variance-Reduced Methods for Machine Learning.
R. Gower, M. Schmidt, F. Bach, P. Richtarik, Proc IEEE, 2020.
[pdf]
- Regret Bounds without Lipschitz Continuity: Online Learning with Relative-Lipschitz Losses.
Y. Zhou, V. Portella, M. Schmidt, N. Harvey, NeurIPS, 2020.
[pdf]
- Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation.
S.Y. Meng, S. Vaswani, I. Laradji, M. Schmidt, S. Lacoste-Julien. AISTATS, 2020.
[pdf]
[slides]
[video]
[code]
- Combining Bayesian Optimization and Lipschitz Optimization.
M. Ahmed, S. Vaswani, M. Schmidt. MLJ, 2020.
[pdf]
[slides]
- Handling the Positive-Definite Constraint in the Bayesian Learning Rule.
W. Lin, M. Schmidt, M. Khan. ICML, 2020.
[pdf]
[slides]
[video]
[code]
- Instance Segmentation with Point Supervision.
I. Laradji, N. Rostamzadeh, P. Pinheiro, D. Vazquez, M. Schmidt. ICIP, 2020.
[pdf]
- A Multiagent Model of Efficient and Sustainable Financial Markets.
B. Shea, M. Schmidt, M. Kamgarpour. NeurIPS ML for Economic Policy, 2020.
[pdf]
2019
- Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates.
S. Vaswani, A. Mishkin, I. Laradji, M. Schmidt, G. Gidel, S. LaCoste-Julien. NeurIPS, 2019.
[pdf]
[poster]
[slides]
[video]
[code]
- "Active-set complexity" of proximal-gradient: How long does it take to find the sparsity pattern?.
J. Nutini, M. Schmidt, W. Hare. OPTL, 2019.
[pdf]
[poster]
[slides]
- A Less Biased Evaluation of Out-of-distribution Sample Detectors (Formerly "Does Your Model Know the Digit 6 is Not a Cat?").
A. Shafaei, M. Schmidt, J. Little. BMVC, 2019.
[pdf]
[poster]
[code]
[video]
- Where are the Masks: Instance Segmentation with Image-Level Supervision.
I. Laradji, D. Vazquez, M. Schmidt. BMVC, 2019.
[pdf]
[poster]
- Fast and Faster Convergence of SGD for Over-Parameterized Models (and an Accelerated Percepton).
S. Vaswani, F. Bach, M. Schmidt. AISTATS, 2019.
[pdf]
[poster]
- Are we there yet? Manifold identification of gradient-related proximal methods.
Y. Sun, H. Jeong, J. Nutini, M. Schmidt. AISTATS, 2019.
[pdf]
[poster]
- Distributed Maximization of Submodular plus Diversity Functions for Multi-label Feature Selection on Huge Datasets.
M. Ghadiri, M. Schmidt. AISTATS, 2019.
[pdf]
[poster]
- Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family Approximations.
W. Lin, M. Khan, M. Schmidt. ICML, 2019.
[pdf]
[poster]
[code]
- Efficient Parameter Estimation for DNA Kinetics Modeled as Continuous-Time Markov Chains.
S. Zolaktaf, F. Dannenberg, E. Winfree, A. Bouchard-Cote, M. Schmidt, A. Condon. DNA, 2019.
[pdf]
[slides]
[code]
- Efficient Deep Gaussian Process Models for Variable-Sized Input.
I. Laradji, M. Schmidt, V. Pavlovic, M. Kim. IJCNN, 2019.
[pdf]
[poster]
[code]
- Newton-Laplace Updates for Block Coordinate Descent.
S.Y. Meng, M. Schmidt. NeurIPS WiML, 2019.
[pdf]
[poster]
2018
- Where are the Blobs: Counting by Localization with Point Supervision.
I. Laradji, N. Rostamzadeh, P. Pinheiro, D. Vazquez, M. Schmidt. ECCV, 2018.
[pdf]
[poster]
[video 1]
[video 2]
[code]
- Online Learning Rate Adaptation with Hypergradient Descent.
A. Baydin, R. Cornish, D. Rubio, M. Schmidt, F. Wood. ICLR, 2018.
[pdf][poster][code]
- SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient.
A. Mishkin, F. Kunstner, D. Nielson, M. Khan, M. Schmidt. NeurIPS, 2018.
[pdf]
[poster]
[video]
[code]
- MASAGA: A Linearly-Convergent Stochastic First-Order Method for Optimization on Manifolds.
R. Babanezhad, I. Laradji, A. Shafaei, M. Schmidt. ECML, 2018.
[pdf]
[code]
- New Insights into Bootstrapping for Bandits.
S. Vaswani, B. Kveton, Z. Wen, A. Rao, M. Schmidt, Y. Abbasi-Yadkori. arXiv, 2018.
[pdf]
2017
-
Minimizing Finite Sums with the Stochastic Average Gradient.
M. Schmidt, N. Le Roux, F. Bach. MAPR, 2017 (submitted 2013) (2018 Lagrange Prize in Continuous Optimization).
[pdf]
[slides]
[proof scripts]
[talk]
[code]
- Model-Independent Online Learning for Influence Maximization.
S. Vaswani, B. Kveton, Z. Wen, M. Ghavamzadeh, L. Lakshmanan, M. Schmidt. ICML, 2017.
[pdf]
[poster]
[slides]
[code]
- Horde of Bandits using Gaussian Markov Random Fields.
S. Vaswani, M. Schmidt, L. Lakshmanan. AISTATS, 2017.
[pdf]
[poster]
[slides]
- Inferring Parameters for an Elementary Step Model of DNA Structure Kinetics with Locally Context-Dependent Arrhenius Rates.
S. Zolaktaf, F. Dannenberg, X. Rudelis, A. Condon, J. Schaeffer, M. Schmidt, C. Thachuk, E. Winfree. DNA, 2017 (Best Student Paper Award).
[pdf]
[slides]
[code]
2016
- Linear Convergence of Gradient and Proximal-Gradient Methods under the Polyak-Lojasiewicz Condition.
H. Karimi, J. Nutini, M. Schmidt. ECML, 2016.
[pdf]
[poster]
[slides]
[addendum]
- Play and Learn: Using Video Games to Train Computer Vision Models.
A. Shafaei, J. Little, M. Schmidt. BMVC, 2016.
[pdf]
[poster]
[slides]
[MIT Technology Review]
- Convergence Rates for Greedy Kaczmarz Algorithms, and Faster Randomized Kaczmarz Rules Using the Orthogonality Graph.
J. Nutini, B. Sepehry, I. Laradji, M. Schmidt, H. Koepke, A. Virani. UAI, 2016.
[pdf]
[poster]
[code]
- Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions.
M. Khan, R. Babanezhad, W. Lin, M. Schmidt, M. Sugiyama. UAI, 2016.
[pdf]
[poster]
[code]
- Do we need "Harmless" Bayesian Optimization and "First-Order" Bayesian Optimization?
M.O. Ahmed, B. Shahriari, M. Schmidt. NeurIPS BayesOPT, 2016.
[pdf]
[poster]
[slides]
- Fast Patch-based Style Transfer of Arbitrary Styles.
T.Q. Chen, M. Schmidt. NeurIPS Constructive ML, 2016.
[pdf]
[poster]
[slides]
[code/videos]
2015
- Coordinate descent converges faster with the Gauss-Southwell rule than random selection.
J. Nutini, M. Schmidt, I. Laradji, M. Friedlander, H. Koepke.
ICML, 2015.
[pdf]
[poster]
[slides]
[talk]
[code]
- Stop Wasting My Gradients: Practical SVRG.
R. Babanezhad, M.O. Ahmed, A. Virani, M. Schmidt, J. Konecny, S. Sallinen.
NeurIPS, 2015.
[pdf]
[poster]
[slides]
[code]
- Non-Uniform Stochastic Average Gradient Method for Training Conditional Random Fields.
M. Schmidt, R. Babanezhad, M.O. Ahmed, A. Defazio, A. Clifton, A. Sarkar. AISTATS, 2015.
[pdf]
[poster]
[slides]
[code]
- Influence Maximization with Bandits.
S. Vaswani, L. Lakshmanan, M. Schmidt. NeurIPS Networks, 2015.
[pdf]
[poster]
- Hierarchical Maximum-Margin Clustering.
G.-T. Zhou, S.J. Hwang, M. Schmidt, L. Sigal, G. Mori. arXiv, 2015.
[pdf]
2014
-
Convex Optimization for Big Data: Scalable, randomized, and parallel algorithms for big data analytics.
V. Cevher, S. Becker. M. Schmidt. IEEE SPM, 2014.
[pdf]
-
Convergence Rate of Stochastic Gradient with Constant Step Size.
M. Schmidt. UBC Technical Report, 2014. [pdf]
2013
-
Block-Coordinate Frank-Wolfe Optimization for Structural SVMs.
S. Lacoste-Julien, M. Jaggi, M. Schmidt, P. Pletscher. ICML, 2013.
[pdf]
[poster]
[slides]
[code]
-
Fast Convergence of Stochastic Gradient Descent under a Strong Growth Condition.
M. Schmidt, N. Le Roux. arXiv, 2013.
[pdf]
2012
-
A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets.
N. Le Roux, M. Schmidt, F. Bach. NeurIPS, 2012 (2018 Lagrange Prize in Continuous Optimization).
[pdf]
[poster]
[slides]
[talk]
[code]
[extended version]
-
Hybrid Deterministic-Stochastic
Methods for Data Fitting.
M. Friedlander, M. Schmidt.
SISC, 2012.
[pdf]
[slides]
[code]
[addendum]
-
A simpler approach to obtaining an O(1/t) convergence rate for projected stochastic subgradient descent.
S. Lacoste-Julien, M. Schmidt, F. Bach. arXiv, 2012.
[pdf]
[code]
-
On Sparse, Spectral and Other Parameterizations of Binary Probabilistic
Models.
D. Buchmann, M. Schmidt, S. Mohamed, D. Poole, N. de Freitas.
AISTATS, 2012.
[pdf]
[poster]
[code].
2011
-
Convergence Rates of Inexact Proximal-Gradient Methods for Convex
Optimization.
M. Schmidt, N. Le Roux, F. Bach.
NeurIPS, 2011.
[pdf]
[poster]
[slides]
[talk]
[code]
-
Projected Newton-type Methods in Machine Learning.
M. Schmidt, D. Kim, S. Sra.
Optimization for Machine Learning (S. Sra, S. Nowozin, S.Wright), MIT Press 2011.
[pdf]
[slides]
[talk]
[code]
-
Generalized Fast Approximate Energy Minimization via Graph Cuts:
Alpha-Expansion Beta-Shrink Moves.
M. Schmidt, K. Alahari.
UAI, 2011.
[pdf]
[poster]
[code]
-
A hybrid stochastic-deterministic optimization method for waveform inversion.
T. van Leeuwen, M. Schmidt, M. Friedlander, F. Herrmann
EAGE, 2011.
[pdf]
[slides]
[code]
2010
-
Graphical Model Structure Learning with L1-Regularization.
M. Schmidt.
PhD Thesis, 2010.
[pdf]
[slides]
[code]
- Convex Structure Learning in Log-Linear Models: Beyond Pairwise Potentials.
M. Schmidt, K. Murphy.
AISTATS, 2010.
[pdf]
[slides]
[talk]
[code]
- Modeling annotator expertise: Learning when everybody knows a bit of something.
Y. Yan, R. Rosales, G. Fung, M. Schmidt, G. Hermosillo, L. Bogoni, L. Moy, J. Dy.
AISTATS, 2010.
[pdf]
[talk]
- Causal Learning without DAGs.
D. Duvenaud, D. Eaton, K. Murphy, M. Schmidt.
JMLR W&CP, 2010.
[pdf]
[poster]
[talk]
[code]
2009
- Optimizing Costly Functions with Simple Constraints: A Limited-Memory Projected
Quasi-Newton Algorithm.
M. Schmidt, E. van den Berg, M. Friedlander, K. Murphy.
AISTATS, 2009 (Best Paper Award)
[pdf]
[slides]
[code]
[examples]
- Group Sparse Priors for Covariance Estimation.
B. Marlin, M. Schmidt, K. Murphy.
UAI, 2009.
[pdf]
[poster]
- Modeling Discrete Interventional Data using Directed Cyclic Graphical Models.
M. Schmidt, K. Murphy.
UAI, 2009.
[pdf]
[slides]
[code]
[addendum]
- Increased Discrimination in Level Set Methods with Embedded Conditional Random Fields.
D. Cobzas, M. Schmidt.
CVPR, 2009.
[pdf]
[poster]
- Optimization Methods for L1-Regularization.
M. Schmidt, G. Fung, R. Rosales.
UBC Technical Report, 2009.
[pdf]
[code]
[examples]
2008
- Structure Learning in
Random Fields for Heart Motion Abnormality Detection.
M. Schmidt, K. Murphy, G. Fung, R. Rosales.
CVPR, 2008.
[pdf]
[poster]
[code]
[addendum]
- Group
Sparsity via Linear-Time Projection.
E. van den Berg, M. Schmidt. M. Friedlander, K. Murphy.
UBC Technical Report, 2008.
[pdf]
[code]
- An interior-point stochastic approximation method and an L1-regularized delta rule.
P. Carbonetto, M. Schmidt, N. de Freitas.
NeurIPS, 2008.
[pdf]
[
2007
- Fast Optimization Methods for L1-Regularization: A Comparative Study
and 2 New Approaches.
M. Schmidt, G. Fung, R. Rosales.
ECML, 2007.
[pdf]
[slides]
[talk]
[code]
[examples]
[addendum]
[extended version]
- Learning Graphical Model Structure using
L1-Regularization Paths.
M. Schmidt,
A. Niculescu-Mizil, K Murphy.
AAAI, 2007.
[pdf]
[code]
[addendum]
- 3D Variational Brain Tumor Segmentation using a High Dimensional Feature
Set.
D. Cobzas, N. Birkbeck, M. Schmidt, M. Jagersand, A. Murtha.
MMBIA, 2007.
[pdf]
[online
material]
2006
- Accelerated Training
of Conditional Random Fields
with Stochastic Gradient Methods.
S. Vishwanathan,
N. Schraudolph,
M. Schmidt,
K. Murphy.
ICML, 2006.
[pdf]
[1d code]
[2d code]
[slides]
- A
Classification-based Glioma Diffusion Model Using MRI Data.
M. Morris,
R. Greiner,
J. Sander,
A. Murtha,
M. Schmidt.
CAI, 2006.
[pdf]
2005
- Segmenting
Brain Tumors using Conditional Random Fields and Support Vector Machines.
C.-H. Lee, M. Schmidt, A. Murtha, A. Bistritz, J. Sander, R. Greiner.
CVBIA, 2005.
[pdf]
[poster]
- Segmenting
Brain Tumors using Alignment-Based Features.
M. Schmidt, I. Levner, R. Greiner, A. Murtha, A. Bistritz.
ICMLA, 2005.
[pdf]
- Support Vector Random Fields for Spatial Classification.
C.-H. Lee, R. Greiner, M. Schmidt.
PKDD, 2005.
[pdf]
[presentation]
- Automatic Brain Tumor Segmentation.
M. Schmidt.
MSc Thesis, 2005.
[pdf]
Notes
Student Theses and Essays
- Computationally Efficient Geometric Methods for Optimization and Inference in Machine Learning
W. Lin, PhD Thesis, 2023.
[pdf]
- Optimistic Thompson Sampling: Strategic Exploration in Bandits and Reinforcement Learning
T. Zhang, MSc Thesis, 2023.
[pdf]
- A Study of the Edge of the Stability in Deep Learning
C. Fox, MSc Thesis, 2023.
[pdf]
- Algorithm Configuration Landscapes: Analysis and Exploitation
Y. Pushak, PhD Thesis, 2022. (CS-Can/Info-Can Distinguished Dissertation Award)
[pdf]
- Subspace optimization for machine learning
B. Shea, MSc Thesis, 2021. [pdf]
- Pragmatic investigations of applied deep learning in computer vision applications
A. Shafaei, PhD Thesis, 2020. [pdf]
- Efficiently estimating kinetics of interacting nucleic acid strands modeled as continuous-time Markov chains
S. Zolaktaf, PhD Thesis, 2020. [pdf]
- Regret Bounds without Lipschitz Continuity: Online Learning with Relative-Lipschitz Losses
Y. Zhou, MSc Thesis, 2020. [pdf]
- Interpolation, Growth Conditions, and Stochastic Gradient Descent
A. Mishkin, MSc Thesis, 2020. [pdf]
- Stochastic Second-Order Optimization for Over-parameterized Machine Learning Models
S.Y. Meng, MSc Thesis, 2020. [pdf]
- Where are the objects? : weakly supervised methods for counting, localization and segmentation
I. Laradji, PhD Thesis, 2020. [pdf]
- Investigating the impact of normalizing flows on latent variable machine translation
M. Przystupa, MSc Thesis, 2020. [pdf]
- Practical optimization methods for machine learning models
R. Babanezhad, PhD Thesis, 2019. [pdf]
- Beyond submodular maximization : one-sided smoothness and meta-submodularity
M. Ghadiri, MSc Thesis, 2019. [pdf]
- Structured Bandits and Applications: Exploiting Problem Structure for Better Decision-making under Uncertainty
S. Vaswani, PhD Thesis, 2019. [pdf]
- Practical Optimization for Structured Machine Learning Problems
M. Ahmed, PhD Thesis, 2018. [pdf]
- Greed is Good: Greedy Optimization Methods for Large-Scale Structured Problems
J. Nutini, PhD Thesis, 2018. (CS-Can/Info-Can Distinguished Dissertation Award) [pdf]
- Deep kernel mean embeddings for generative modeling and feedforward style transfer
T.Q. Chen, MSc Thesis, 2017. [pdf]
- Finding a Maximum Weight Sequence with Dependency Constraints
B. Sepehry, MSc Essay, 2016. [pdf] [slides]
Mark Schmidt > Publications