NeurIPS’22: New proposed model beats state-of-the-art in pool-based active learning
This is part 5 of a series featuring some of the UBC CS department’s accepted papers for NeurIPS 2022 (conference runs Nov. 29 – Dec. 9).
The research of Dr. Danica Sutherland, Assistant Professor at UBC Computer Science, and her students, is well on track. Three of their papers have been accepted for the NeurIPS 2022 conference in November.
“This level of productivity from the lab is gratifying,” said Dr. Sutherland. “I haven’t been at UBC for too long, and it can typically take a little while for things to get up and running. But I have really great students and it’s nice to see that we seem to be at least on the right track in the lab with our research.”
Sutherland explained that one of the three papers accepted to the conference proposes a new way to approach Active Learning in the Deep Learning world.
The paper, Making Look-Ahead Active Learning Strategies Feasible with Neural Tangent Kernels, was authored by MSc student Mohamad Amin Mohamadi and PhD student Wonho Bae (joint first authors), and Danica as supervisor.
Danica explained the concept, “In typical machine learning, if you want to learn to label images, you have a big dataset of images that someone has gone through and labeled by hand. But just labeling a huge dataset takes a lot of time and effort. Ideally, you’d like to be able to have a deep learning model tell you what the labels are, to save time and effort. But you also need a decent amount of confidence that the labels are being assigned correctly.”
Danica said the pool-based active learning method proposed in their paper can drastically cut down on the amount of human involvement required.
“Let’s say for example, you have a bunch of images of dogs. For one particular unlabeled image, the deep learning model thinks it’s a German Shepherd, with 70% confidence. But maybe there are a bunch of dog images that look pretty similar, where the model’s not as sure what breed they are. If we imagined that we re-trained the model and told it for sure that this one is a German Shepherd, would it be more confident about the other images?”
Danica said their new model asks these questions, and selects the data points that inspire the most confidence in the rest of the data. “It’s a way to approximate what a deep network would do if you did give it more data.
It is a Machine Learning approach for reducing the data labeling effort. Given a pool of unlabeled samples, it tries to select the most useful ones to label so that a model built from them can achieve the best possible performance.
Let the machines do the work
“Really what we want in Deep Learning, is to spend the least amount of human effort creating labels for our data. The more a model can do that for us with a reasonable amount of accuracy, the better.”
Danica explained that this model can help ML folks in a huge variety of applications, from testing candidate drugs in biology to testing the properties of materials before constructing a product. Active learning can be a significant time saver (in this paper, up to twice as fast as labeling random points), and therefore less expensive thanks to fewer person-hours required. The results naturally fluctuate according to how many images you’re dealing with, how much data is required and the desired level of accuracy.
Danica is also the co-author of two more papers at the conference, in collaboration with many other researchers from Chicago, Stanford, Harvard, Autodesk, and her UBC CS PhD student Hamed Shirzad:
A Non-Asymptotic Moreau Envelope Theory for High-Dimensional Generalized Linear Models Lijia Zhou, Frederic Koehler, Pragya Sur, Danica J. Sutherland, Nati Srebro
Evaluating Graph Generative Models with Contrastively Learned Features
Hamed Shirzad, Kaveh Hassani, Danica J. Sutherland
Danica teaches Machine Learning and Data Mining, and is also part of the Canada CIFAR AI Chair program. Learn more about Danica. In total, the department has 13 accepted papers by 9 professors at the NeurIPS conference. Read more about the accepted papers and their authors.