Events

Name: Seyed Ali Tabatabaee

Date: Tuesday, 22 July 2025

Time: 11:00 am to 2:00 pm

Location: ICCS 202

Zoom Link: To be confirmed

Thesis Title: Optimization with Explorable Uncertainty

Abstract:

Many real-world problems involve elements (e.g., clients, jobs, etc.) with uncertain properties. Acquiring more accurate properties of these elements is often costly but sometimes necessary for high-quality solutions. The goal is to find such solutions through cost-effective strategies for obtaining more accurate information. The described model is often referred to as explorable uncertainty. This thesis studies optimization problems from the domains of facility location and job scheduling in the explorable uncertainty model. First, we study center problems with moving entities of bounded speed in Euclidean space, where the movement of the entities is unpredictable and processing must be done in real-time. Center problems involve determining the location of a facility to serve a set of entities while optimizing a specified objective function. In particular, we investigate computing the 1-center, centroid, center of mass, and 1-median for a set of moving entities. Next, motivated by the connections observed between these problems and perpetual scheduling problems, we shift our focus to study the latter in more depth. Perpetual scheduling problems involve jobs that require recurring processing. The goal is to process these jobs while optimizing a specified objective function. More specifically, we investigate two prominent examples of perpetual scheduling problems, namely the bamboo trimming problem and the windows scheduling problem, in settings that have not been considered before. Finally, we study these two scheduling problems in the model of explorable uncertainty, where jobs’ processing requirements can be reduced by taking some actions. We provide novel algorithms for the problems considered throughout this thesis. Our results contribute to a better understanding of the various forms of explorable uncertainty and the additional intricacy introduced to optimization problems when considered in this model.

-

Name: Tony Mason

Date: July 21, 2025

Time: 13:00-16:00

Location: ICCS 246

Supervisor: Margo Seltzer

Title: Indaleko: The Unified Personal Index

Abstract:
Digital data overload—1.7MB generated per second, 361 billion emails daily in 2024—forces users to waste up to 25% of their time searching for or recreating files. Scattered across devices, cloud services, and inconsistent interfaces, data is nearly impossible to find, like a six-month-old document with no recalled name or location. To address this, I propose the Unified Personal Index (UPI), a system that unifies storage metadata, semantic metadata from file content, and human activity context from user interactions. Unlike siloed cloud searches, the UPI creates a single, human-centric index that transcends storage boundaries, aligning retrieval with how we remember.

Implemented via the Indaleko prototype, the UPI uses natural language processing and activity tracking to collect and query metadata across platforms, enabling intuitive searches like “find files edited on my phone while traveling.” Ongoing evaluations are validating superior retrieval effectiveness, leveraging activity context to match experiential cues. By mirroring human memory processes, the UPI simplifies finding and lays the foundation for advanced tools capable of leveraging its abilities to enable finding. The UPI redefines digital retrieval, transforming searching into finding as naturally as we recall a moment.

-
ICCS 246

Name: Wonho Bae

Date: July 7

Time: 1:00 pm

Location: ICICS 146

Supervisor: Prof. Danica Sutherland

Thesis Title: Budget-Robust Active Learning

Abstract:
Deep learning has made significant strides in recent years, largely due to the availability of vast amounts of labeled data. However, expensive and time consuming manual annotation limits the widespread adoption of Artificial Intelligence (AI), particularly for smaller organizations and individuals. This highlights the need for data-efficient AI frameworks that reduce dependence on large-labeled datasets, making AI more accessible. Active learning, where a model strategically selects the most informative data points for annotation, offers a promising solution to this challenge. It improves model performance with fewer labeled examples, making it especially valuable in domains where labeling is costly. Recent research has revealed that the effectiveness of active learning methods varies significantly across different budget regimes, where the budget is defined by the size of a labeled set. In particular, uncertainty-based methods, which perform well in high-budget settings, often underperform compared to representation-based methods or even random sampling in low-budget regimes. In this thesis, we investigate how to improve active learning under both high- and low-budget regimes. We begin with the high-budget setting, where we introduce a novel uncertainty-based method that leverages neural tangent kernels (NTKs) to make computation of look-ahead acquisition strategies feasible. This approach allows the model to account for the changes of “future” predictions, resulting in strong performance across various datasets, particularly in the high-budget regimes. In the low-budget regimes, we propose MaxHerding, a representation-based method that generalizes the recently introduced ProbCover and establishes connections to other low-budget active learning techniques. To further explore active learning under limited annotation budgets, we consider its application to meta-learning (or few-shot learning) and develop a simple yet effective acquisition strategy based on Gaussian Mixture Models (GMMs), motivated by a max-margin classifier. Given the difficulty of determining the appropriate budget regime in advance, we finally propose Uncertainty Herding (UHerding), a budget-robust active learning method that adaptively interpolates between uncertainty and representation-based strategies. Our empirical results show that UHerding consistently outperforms existing methods across a wide range of budget regimes, offering a promising direction toward hyperparameter-free and more robust active learning in real-world applications.

-