Nonparametric Belief Propagation with Fast Methods. Alexander Ihler
(MIT)
Nonparametric Belief Propagation (NBP) provides a Monte Carlo
(sample-based) approximation to Belief Propagation (BP) for continuous,
non-Gaussian distributions defined on arbitrary graphical models. NBP
approximates each BP message using a kernel density estimate, specifically
a Gaussian mixture distribution (and is thus similar to regularized
particle filtering on Markov Chains). We have applied NBP successfully in
several problems, including distributed inference in sensor networks and
video tracking of articulated models.
Because NBP makes heavy use of kernel density estimates, fast methods and
approximations play a key role in its practical use. In particular,
drawing samples from the product of d Gaussian mixtures is an extremely
costly procedure (naively O(N^d)). Using multipole-like methods one may
reduce this cost to (asymptotically) O(N). Fast density approximation
techniques may also be applied to render the algorithm more
computationally tractable.
Distance Transforms of Sampled Functions. Daniel Huttenlocher
(Cornell University)
A distance transform of a binary image specifies for each 0-valued pixel the
distance to the nearest 1-valued pixel (or vice versa). This tutorial will
describe a generalization of distance transforms to arbitrary sampled
functions rather than binary functions and will then provide simple,
efficient algorithms for computing them. These algorithms are applicable to
computing classical distance transforms of binary images as well as to
solving a broad class of minimization problems involving a cost function with
both local and spatial terms. Alternatively the techniques can be viewed in
terms of the minimum convolution of two functions, which is an important
operation in grayscale morphology. The methods are also applicable to Viterbi
decoding, belief propagation and optimal control.
Fast High-Dimensional Feature Indexing for Object Recognition. David Lowe (UBC)
The best current methods for object recognition extract many
high-dimensional local features from each image and then compare each
feature against a large database of features from many training images.
This talk will discuss practical methods for solving this
high-dimensional indexing problem. The most successful methods found
so far have been those using approximate variants of k-d tree lookup.
These can find almost all useful matches while providing at least 4
orders of magnitude efficiency improvement over the best known methods
for exact matching. One reason for the success is that matches are
useful only when they are significantly closer than would be predicted
by the local density of non-matching neighbors. The result is that
the high-dimensional feature indexing problem can be solved in
practice even for real-time systems and for very large databases.
Tutorial on Statistical N-Body Problems and Proximity Data Structures.
Alex Gray (CMU)
First I will give an overview of adaptive tree data structures which can
be used to make statistical computations go fast. I will describe the
results of a recently-completed thorough experimental comparison
of the best-known data structures for nearest-neighbor search,
comparing over 30 different methods.
I'll survey some important generalized N-body problems which arise in
statistical learning, including kernel density estimation,
all-nearest-neighbors, Gaussian process regression, local polynomial
regression, n-point correlations, nonparametric Bayes classification,
and several others.
Then I will describe the dual-tree (or multi-tree) framework for
developing fast linear-time algorithms for statistical N-body
problems, including how it generalizes and extends work in
computational geometry, spatial databases, and computational
physics. I'll also explain why this new approach is more appropriate
suited to statistical problems than the classical approaches used for
N-body problems in physics. I'll then give several examples of how to
use the framework to create fast methods for the various problems above
requiring different twists, highlighting the new idea needed for each. These
examples will also illustrate several exciting open questions which I
will pose for the workshop.
Fast High-dimensional Function Integration, without Markov Chains.
Alex Gray (CMU)
I will present a new method for integration of high-dimensional
functions, which is not based on Markov chain theory. Instead, I
develop from 'first principles' (of Monte Carlo theory), a new
framework in which the integration problem is cast explicitly as a
problem of statistical estimation, or 'reconstruction' of the function
from samples. The method uses a nonparametric regression estimator
which directly minimizes integration error, rather than some loss
function which is disconnected from the goal, as have been proposed by
some. The main computational sub-problem is approached using
custom-designed N-body algorithms based on a recently-developed
generalized framework for such problems. The resulting method
addresses the two main issues of the Metropolis-Hastings method: Its
parameter selection is automatic and optimal, and it produces iid
samples, making it easier to understand its behavior. It is also more
general - for example it can directly integrate the normalizing
constant arising in Bayesian inference and statistical mechanics. A
head-to-head experimental comparison with MCMC on canonical challenge
functions will illustrate the new method's practical viability. More
largely, characterizing integration as statistical learning opens the
door to a host of powerful mathematical and computational tools beyond
those of Markov chain theory.
Tutorial on Fast Multipole Methods and the FGT. Ramani Duraiswami (University of Maryland)
The fast multipole method has been called one of the ten most significant
numerical algorithms discovered in the 20th century, and won its inventors,
Vladimir Rokhlin and Leslie Greengard, the Steele prize. The algorithm
allows the product of particular dense matrices with a vector to be
evaluated in O(N logN) operations, when direct multiplication requires
O(N2) operations. Coupled with advances in iterative methods for solution
of linear systems, they allow O(N logN) time complexity and O(N) memory
complexity solution of problems that hitherto required O(N3) time
complexity and O(N2) memory complexity. I will discuss the general ideas of
the method, and provide some specific examples of its application.
One particular application, and that which is important to density
estimation techniques, is the fast Gauss transform. Unfortunately, the cost
of a direct extension of the FGT to higher-dimensional problems grows
exponentially with dimension (the curse of dimensionality). We introduce an
improved version of the fast Gauss transform to efficiently estimate sums of
Gaussians in higher dimensions. Our proposed algorithm incorporates three
innovations that dramatically improve the performance: a multivariate Taylor
expansion scheme, a pyramid-like data structure to store and manipulate the
expansions, and an adaptive space subdivision technique based on the
k-center algorithm. These innovations result in an algorithm that is much
faster and has been demonstrated in dimensions up to ten.
The fast Gauss transform has been applied to the kernel density estimation
(KDE), to the mean shift algorithm for clustering, where it improves the
quadratic complexity of each iteration, and to kernel machines. We present
results from these applications.