Home | Speakers | Abstracts | Schedule | Accommodations |

## Titles and Abstracts |
## Created by Dinoj Surendran |

**
Nina Amenta, University of Califonia, Davis
Manifold Reconstruction With The MLS Surface
**

Abstract:

We consider the definition of a k-dimensional surface associated with a cloud of
noisy points in d-dimensional Euclidean space. For two-dimensional surface
reconstruction in computer graphics,David Levin's definition of the "MLS surface"
has proved to be quite useful. We extend this definition to arbitrary k and d. It
involves on a projection procedure, which takes arbitrary points in space onto
the surface. There are connections to an older line of research into "Principal
surfaces".

**
Sanjeev Arora, Princeton University
Semidefinite Programming and Graph Partitioning Problems
**

Abstract:

Abstract:
Graph partitioning and related problems arise often in data analysis.
These problems are NP-hard, and in the past few years new approximation
algorithms have been designed using semidefinite programming. The talk
will survey these new algorithms, as well as recent improvements that
replace semidefinite programming with faster combinatorial algorithms.

**
Dominique Attali, Grenoble
Size of Delaunay Triangulation For Points Distributed Over Lower-Dimensional Polyhedra
**

Abstract:

The Delaunay triangulation of a set of points is a data structure, which in low
dimensions has applications in surface reconstruction, mesh generation, molecular
modeling, geographic information systems, and many other areas of science and
engineering. Like many spatial partitioning techniques, however, it suffers from
the ``curse of dimensionality": the worst-case complexity of the Delaunay
triangulation increases exponentially with the dimension. Indeed, the size of the
Delaunay triangulation of $n$ points in dimension $d$ is at most
$O(n^{\lceil \frac{d}{2} \rceil})$ and this bound is achieved exactly for points
lying on the {\em moment curve}. The starting point of the work reported in this
talk is the following observation: all the examples that we have of point sets
which have Delaunay triangulation of size $O(n^{\lceil \frac{d}{2} \rceil})$ are
distributed on one-dimensional curves. At the opposite extreme, points distributed
uniformly inside a $d$-dimensional volume have Delaunay triangulations of complexity
$O(n)$. Our motivation for this work is to fill in the picture for distributions
between the two extremes, in which the points lie on manifolds of dimension
$2 \leq p \leq d-1$. We prove that the Delaunay triangulation of a set of points
distributed nearly uniformly on a polyhedron of dimension $p$ in $d$-dimensional
space is $O(n^{{(d-k+1)}/{p}})$ with $k = \lceil (d+1)/(p+1) \rceil$. This bound
is tight and improves on the worst-case bound of $O(n^{\lceil d/2 \rceil})$ for
all $2 \leq p \leq d-1$. It coincides with it for $p=1$. This is a joint work
with Nina Amenta and Olivier Devillers.

**
Misha Belkin, Ohio State University
Laplacian and Spectral Methods for Data Analysis
**

Abstract:

Graph Laplacian associated to a point cloud or a mesh is useful in many problems of machine learning as well as in
computer graphics and surface analysis. I will discuss the graph and manifold Laplacian, its applications and how it can
be reconstructed from a point cloud or from a mesh.
I will also talk about some recent work on algorithms for analyzing Gaussian mixture distributions based on spectral properties of
certain kernel matrices.

**
Gunnar Carlsson, Stanford University
Topological Methods In Data Analysis
**

Abstract:

I will survey various topological methods for the study of data, including persistent
homology, multidimensional persistent homology, and imaging techniques for high
dimensional data.

**
Frederic Chazal, INRIA, France
Sampling and Topological Inference For General Shapes
**

Abstract:

In many practical situations, the object of study is only known through a finite
set of possibly noisy sample points. It is then desirable to try to recover the
geometry and the topology of the object from this information. The most obvious
example is probably surface reconstruction, where the points are measured on the
surface of a real world object. In other applications (e.g. manifold learning or
time series analysis), the shape of interest may live in a higher dimensional
space. In this context, an important question is to find sampling conditions
guaranteeing that the shape can be reconstructed correctly. Besides providing
theoretical guarantees, such a condition may be used to drive the reconstruction
process itself. Indeed, a possible reconstruction strategy is to look for the shapes
that are best sampled by the data points.

In this talk, we investigate these questions in a fairly general setting, assuming a very simple reconstruction process. We show how the framework of distance functions can be used to study some geometric and topological approximation problems on general shapes (represented by compact subsets of R^n). We introduce a parameterized notion of feature size that interpolates between the minimum of the local feature size, and the recently introduced weak feature size. Based on this notion of feature size, we propose sampling conditions that apply to noisy samplings of general compact sets in euclidean space. These conditions are sufficient to ensure the topological correctness of a reconstruction given by an offset of the sampling. Our approach also yields new stability results for medial axes, critical points and critical values of distance functions.

**
David Cohen-Steiner, INRIA, France
Stability of Boundary Measures
**

Abstract:

An approach for studying the geometry of a shape is to look at the volume of its
parallel bodies. This leads for example to Federer's curvature measures, which
generalize usual curvatures to a certain class of possibly singular subsets of
Euclidean space. In this talk, I will discuss what happens if one applies the
same idea to point cloud data. The main result is an error bound establishing
convergence of the obtained measures when the sampling density increases.

Joint work with F. Chazal and Q. Merigot.

**
Sanjoy Dasgupta, University of California - San Diego
The Random Projection Trees and Low-Dimensional Manifolds
**

Abstract:

The curse of dimensionality has traditionally been the bane of nonparametric
statistics (histograms, kernel density estimation, nearest neighbor search,
and so on), as reflected in running times and convergence rates that are
exponentially bad in the dimension. This problem is all the more pressing as
data sets get increasingly high dimensional.

Recently the field has been rejuvenated in several ways, of which perhaps the most promising is the realization that a lot of real-world data which appears high-dimensional in fact has low "intrinsic" dimension in the sense of lying close to a low-dimensional manifold. In the past few years, there has been a huge interest in learning such manifolds from data, and then using the learned structure to transform the data into a lower-dimensional space where standard statistical methods generically work better.

I'll exhibit a way to benefit from intrinsic low dimensionality without having to go to the trouble of explicitly learning its fine structure. Specifically, I'll present a simple variant of the ubiquitous k-d tree (a spatial data structure widely used in machine learning and statistics) that is provably adaptive to low dimensional structure. We call this a "random projection tree" (RP tree).

Along the way, I'll discuss different notions of intrinsic dimension -- motivated by manifolds, by local statistics, and by analysis on metric spaces -- and relate them. I'll then prove that RP trees require resources that depend only on these dimensions rather than the dimension of the space in which the data happens to be situated.

This is work with Yoav Freund (UC San Diego).

**
Vin de Silva, Pomona College
Witness Complexes and Bicomplexes in
Topological Approximation
**

Abstract:
Witness complexes were devised as a method for approximating the
topology of a point-cloud data set. The data set is represented either
as a single simplicial complex, or (more usefully) as a nested family of
simplicial complexes, to which the theory of persistence can be applied.
Several groups of researchers have established correctness theorems in
this static (0-parameter) setting.
Gunnar Carlsson and I have recently been studying 1-parameter families
of witness complexes. These can be generated from 1-parameter families
of data sets, or by varying an internal parameter in the standard
witness complex construction (the number of landmarks, say). We propose
a strategy for studying such families. I will present this strategy from
the point of view of two new algebraic technical tools: (i) witness
bicomplexes; and (ii) a generalised persistent homology.
--

**
Tamal Dey, Ohio State University
Voronoi Diagrams in Estimating Geometry and Topology From Data
**

Abstract:
Given a set of points P presumably sampled from a surface S in R^3,
many geometric and topological properties can be estimated from the
Voronoi diagram of the point set if P samples S well.
For example, normals to the
surface can be estimated using poles (as defined by Amenta and Bern);
a piecewise linear surface isotopic to S with a small Hausdorff
distance (relative to local feature sizes) can be computed
using algorithms such as Crust and Cocone. These results can
be extended to the case when P is noisy but still samples
S well. When S is a smooth manifold embedded in an Euclidean space,
Voronoi diagrams still capture the essential geometry and the topology
of S though some new observations are required. This talk will survey
these results.

**
Tom Duchamp, University of Washington
Manifold Estimation
**

Abstract:

I will discuss some work of the 3D photography group at the University of
Washington on the "manifold estimation problem": given a point cloud in a
neighborhood of an (unknown) submanifold of Euclidean space, estimate the
submanifold.
Replacing the point cloud by a probability distribution leads to a natural
variational problem: find a submanifold M that is a local minimum for the
expected square of the distance to M. Critical submanifolds are called
"principal submanifolds". Unfortunately principal submanifolds are never
local minima. Thus, simple least-squares fitting will not work. However,
critical manifolds are local minima if one restricts to low frequency
perturbations with respect to a certain second order operator related to
the Laplace-Beltrami operator. The cutoff frequency can be estimated in
terms of certain transverse statistics. Alternately, one can regularize
the problem by adding a smoothing term such as a multiple of harmonic
energy. The size of the smoothing parameter k can again be estimated in
terms of transverse statistics.

**
Leonidas Guibas, Stanford University
Reconstruction Using Witness Complexes
**

Abstract:

The problem of reconstructing manifolds from scattered data finds numerous
applications across a number of areas, including image or signal processing,
computer graphics, robotics, machine learning, or scientific visualization.
It has been extensively studied by the computational geometry community in
low dimensions. where reconstruction methods based on the use of the Delaunay
triangulation D(S) of the input point set S have been proposed. In these
methods, an output simplicial complex is extracted from D(S) that is guaranteed
to be topologically equivalent to the underlying manifold M, and geometrically
close to M, provided that S satisfies mild sampling conditions. Such methods do
not immediately carry over to manifolds of arbitrary dimensions, however, since
in this more general setting the input point set may well-sample several
manifolds with different topological types. Furthermore, the cost of computing
high-dimensional Delaunay triangulations is rather high.

We show how the witness complex, a complex closely related to Daleaunay recently studied by Silva and Carlsson, as well topological persistence ideas, as introduced by Zomorodian, Edelsbrunner, and Letscher, can overcome some of these difficulties. The witness complex construction is largely based on distance comparisons and can be applied in general metric space settings. In the Euclidean case, its efficiency can be guaranteed theoretically, both in terms of complexity and in terms of approximation quality. Some experimental results will also be shown.

This is joint work with Jean-Daniel Boissonnat and Steve Oudot.

**
Andre Lieutier, Aix-en-Provence, France
Stability and Computation of Topological Invariants of Solids in R^n From Approximations
**

Abstract:

In this work, one proves that under quite general assumptions one can deduce the
topology of a bounded open set in R^n from an Hausdorff approximation of its
boundary. For this, one introduces the weak feature size (wfs) that extends for
non smooth objects the notion of reach. Roughly speaking, for a bounded open
set O, wfs(O) measures the minimal size of the topological features of O and we
prove that a wfs(O)/4 Hausdorff approximation of the boundary of O determine its
homotopy type.

These results apply to open sets with positive wfs which includes subanalytic open sets and cover many cases encountered in practical applications. The proofs are based upon the study of distance functions to closed sets and their critical points. The notion of critical point is the same as the one used in riemannian geometry and nonsmooth analysis. As an application, one gives a way to compute the homology groups of open sets from noisy samples of points on their boundary. This works raises several open problems. In particular, even if the homotopy type is determined by the approximation, no algorithm is known in general that could produce a realisation (as simplicial complex for example) of this homotopy type.

**
Mauro Maggioni, Duke University
Analysis of and on Data Sets Through Diffusion Operators**

Abstract:

I will discuss how data sets modeled as weighted graphs can be studied through diffusions or random walks on them. These diffusions, and
in particular the associated heat kernel, can be used to study certain geometric properties of the graph, as well as finding
bi-Lipschitz parametrizations (mapping geodesic distances on the set to Euclidean distances in parameter space) on large chunks of the
data, at least when the data is "manifold-like". Moreover, one can associate dictionaries of functions to the diffusion operators, for
example Fourier-like basis functions (the eigenfunctions of the Laplacian) or wavelet-like functions (called diffusion wavelets). These
dictionaries can be used to analyze and approximate functions on graphs, and to perform learning tasks, or for tasks such as compression
and denoising. We present several examples - from semisupervised learning to image denoising and to document organization.

**
Yuriy Mileyko, Duke University
L_p-Stability of Persistence of Lipschitz Functions
**

Abstract:

We will discuss two stability results for Lipschitz functions on a triangulable,
compact metric space. The first result is formulated in terms of the Wasserstein
distance between persistence diagrams of functions; the second is formulated in
terms of the total persistences of functions. We will also describe applications
of the two results to measuring the similarity between gene expression patterns
in the development of arabidopsis and to assessing the periodicity of gene
expression time-series in the development of mouse embryos, respectively.

**
Sayan Mukherjee, Duke University
Predictive Models That Infer Structure and
Dependencies
**

Abstract:

We consider the problem of learning gradients in the supervised setting where there are multiple, related tasks. Gradients provide a
natural interpretation to the geometric structure of data, and can assist in problems requiring variable selection and dimension
reduction. By extending this idea to the multi-task learning (MTL) environment, we present methods for simultaneously learning structure
within each task, and the shared structure across all tasks. This allows for dimension reduction as well as inference of graphical
models. An application of this framework to modeling tumor progression will be highlighted. This application will discover which a
priori defined pathways are relevant either throughout or in particular steps of progression. Pathway interaction networks are inferred
for these relevant pathways over the steps in progression. This is followed by the refinement of genes in relevant pathways to those
genes most differentially expressed in particular disease stages. The final analysis infers a gene interaction network for these refined
gene sets. We apply this approach to model progression in prostate cancer and melanoma resulting in a deeper understanding of the
mechanism of tumorigenesis.
I will close the talk with 5 minutes of very preliminary work on probabilistic aspects of persistence homologies.

**
Partha Niyogi, University of Chicago
A Geometric Perspective on Learning Algorithms
**

Abstract:

Increasingly, we face machine learning problems in very high dimensional
spaces. We proceed with the intuition that although natural data lives
in very high dimensions, they have relatively few degrees of freedom.
One way to formalize this intuition is to model the data as lying on or
near a low dimensional manifold embedded in the high dimensional space.
This point of view leads to a new class of algorithms that are "manifold
motivated" and a new set of theoretical questions that surround their
analysis. A central construction in these algorithms is a graph or
simplicial complex that is data-derived and we will relate the geometry
of these to the geometry of the underlying manifold. Applications to
embedding, clustering, classification, and semi-supervised learning will
be considered.

**
Stephen Smale, Toyota Technical Insitute-Chicago
Thoughts and Perspectives on the Topology of Data.
**

Abstract:

Some implication of the study of images toward foundational
questions of data analysis will be discussed.

**
Werner Stuetzle, University of Washington
Estimating the Cluster Tree of a Density
**

Abstract:

The general goal of clustering is to identify distinct groups in a
collection of objects. To cast clustering as a statistical problem we
regard the feature vectors characterizing the objects as a sample from
some unknown probability density. The premise of nonparametric
clustering is that groups correspond to modes (local maxima) of this
density. The cluster tree summarizes the connectivity structure of the
level sets of a density; leaves of the tree correspond to modes of the
density. I will define the cluster tree, present methods for its
estimating, show examples, and discuss some open problems.
This is joint work with Rebecca Nugent (CMU).

**
Santosh Vempala, Georgia Tech
The Best Manifold Problem:Formulation, Hardness
and a Conjecture
**

Abstract:

Motivated by wireless communication, we consider the problem of
assigning lengths to the edges of an unweighted graph so as to
capture a given metric as accurately as possible. We show that the
problem is hard for rather special cases. On the other hand, a local
search algorithm seems to perform well in practice. We present a
conjecture on the distortion of such embeddings.
Joint work with Varun Kanade.

**
Shmuel Weinberger, University of Chicago
Computing Homological Invariants From Noisy Data
**

Abstract:

I will discuss, mainly through examples, some probabilistic aspects of the problem
of computing the homology of a smooth manifold either from exact data or from
noisy data. This will be based in part on joint work with P. Niyogi and S. Smale
and with Y. Baryshnikov.

**
Keith Worsley, McGill University
The Geometry of Random Fiels in Astrophysics and Brain Mapping
**

Abstract:

The key tool used in the analysis is the Euler characteristic of the excursion set of a random field. An exact expression for its
expectation is found using new results in random field theory involving Lipschitz-Killing curvatures and Jonathan Taylor's Gaussian
Kinematic Formula. The heuristic for this comes from an idea in geostatistics and the Nash Embedding Theorem. I will show applications
to the density of galaxies, the density of multiple sclerosis lesions and its relation to cortical thickness, and the density of
"bubbles" in a face perception experiment and its relation to fMRI data.