Talk Moshe Gabel

Date

Abstract: Recent years have seen an explosion in the number of connected devices, which means not only growth in velocity and volume of data, but also that data sources are increasingly spatially distributed, incurring higher communication costs. Data mining algorithms often assume that data is centralized or that communication is inexpensive: the setting is implicitly assumed to be a data center. In settings like wireless sensor networks, however, communication uses limited battery power. Moreover, most work only considers one-shot computation: computing a result once from a fixed data set. Yet data is increasingly dynamic, and many applications require current results over a recent time window.

This talk will focus on computing approximations over aggregated distributed data streams with reduced communication using geometric monitoring. Geometric monitoring is a recent general framework developed in our group for monitoring distributed streams. The key observation for geometric monitoring is that low-communication monitoring of even highly-complex functions is often possible by deriving constraints in the input domain, rather than the output. These global constraints are then decomposed to local constraints on input data of each stream (or node), that each node can check independently.

We'll review three novel distributed approximations for important non-linear functions: variance, Shannon's entropy, and least-squares regression. Our algorithms provide deterministic user-defined approximation bounds, while avoiding messages unless they are needed to maintain those bounds. Compared to the centralized solution, our algorithms reduce communication by up to two orders of magnitude on several real data sets that represent real applications, including machine failure detection, network monitoring, traffic monitoring, and others.

BIO: Moshe Gabel recently received his PhD from the Computer Science Department in the Technion -- Israel Institute of Technology. He is interested in machine learning in general, and interesting applications thereof. He has worked on distributed data mining algorithms, machine health monitoring, deep neural network compression, analyzing storage logs for SSDs, and more.

http://www.cs.technion.ac.il/people/mgabel/