Home Speakers Abstracts Schedule Accommodations

Titles and Abstracts

Created by Dinoj Surendran

Nina Amenta, University of Califonia, Davis
Manifold Reconstruction With The MLS Surface

We consider the definition of a k-dimensional surface associated with a cloud of noisy points in d-dimensional Euclidean space. For two-dimensional surface reconstruction in computer graphics,David Levin's definition of the "MLS surface" has proved to be quite useful. We extend this definition to arbitrary k and d. It involves on a projection procedure, which takes arbitrary points in space onto the surface. There are connections to an older line of research into "Principal surfaces".

Sanjeev Arora, Princeton University
Semidefinite Programming and Graph Partitioning Problems

Abstract: Graph partitioning and related problems arise often in data analysis. These problems are NP-hard, and in the past few years new approximation algorithms have been designed using semidefinite programming. The talk will survey these new algorithms, as well as recent improvements that replace semidefinite programming with faster combinatorial algorithms.

Dominique Attali, Grenoble
Size of Delaunay Triangulation For Points Distributed Over Lower-Dimensional Polyhedra

The Delaunay triangulation of a set of points is a data structure, which in low dimensions has applications in surface reconstruction, mesh generation, molecular modeling, geographic information systems, and many other areas of science and engineering. Like many spatial partitioning techniques, however, it suffers from the ``curse of dimensionality": the worst-case complexity of the Delaunay triangulation increases exponentially with the dimension. Indeed, the size of the Delaunay triangulation of $n$ points in dimension $d$ is at most $O(n^{\lceil \frac{d}{2} \rceil})$ and this bound is achieved exactly for points lying on the {\em moment curve}. The starting point of the work reported in this talk is the following observation: all the examples that we have of point sets which have Delaunay triangulation of size $O(n^{\lceil \frac{d}{2} \rceil})$ are distributed on one-dimensional curves. At the opposite extreme, points distributed uniformly inside a $d$-dimensional volume have Delaunay triangulations of complexity $O(n)$. Our motivation for this work is to fill in the picture for distributions between the two extremes, in which the points lie on manifolds of dimension $2 \leq p \leq d-1$. We prove that the Delaunay triangulation of a set of points distributed nearly uniformly on a polyhedron of dimension $p$ in $d$-dimensional space is $O(n^{{(d-k+1)}/{p}})$ with $k = \lceil (d+1)/(p+1) \rceil$. This bound is tight and improves on the worst-case bound of $O(n^{\lceil d/2 \rceil})$ for all $2 \leq p \leq d-1$. It coincides with it for $p=1$. This is a joint work with Nina Amenta and Olivier Devillers.

Misha Belkin, Ohio State University
Laplacian and Spectral Methods for Data Analysis

Graph Laplacian associated to a point cloud or a mesh is useful in many problems of machine learning as well as in computer graphics and surface analysis. I will discuss the graph and manifold Laplacian, its applications and how it can be reconstructed from a point cloud or from a mesh. I will also talk about some recent work on algorithms for analyzing Gaussian mixture distributions based on spectral properties of certain kernel matrices.

Gunnar Carlsson, Stanford University
Topological Methods In Data Analysis

I will survey various topological methods for the study of data, including persistent homology, multidimensional persistent homology, and imaging techniques for high dimensional data.

Frederic Chazal, INRIA, France
Sampling and Topological Inference For General Shapes

In many practical situations, the object of study is only known through a finite set of possibly noisy sample points. It is then desirable to try to recover the geometry and the topology of the object from this information. The most obvious example is probably surface reconstruction, where the points are measured on the surface of a real world object. In other applications (e.g. manifold learning or time series analysis), the shape of interest may live in a higher dimensional space. In this context, an important question is to find sampling conditions guaranteeing that the shape can be reconstructed correctly. Besides providing theoretical guarantees, such a condition may be used to drive the reconstruction process itself. Indeed, a possible reconstruction strategy is to look for the shapes that are best sampled by the data points.

In this talk, we investigate these questions in a fairly general setting, assuming a very simple reconstruction process. We show how the framework of distance functions can be used to study some geometric and topological approximation problems on general shapes (represented by compact subsets of R^n). We introduce a parameterized notion of feature size that interpolates between the minimum of the local feature size, and the recently introduced weak feature size. Based on this notion of feature size, we propose sampling conditions that apply to noisy samplings of general compact sets in euclidean space. These conditions are sufficient to ensure the topological correctness of a reconstruction given by an offset of the sampling. Our approach also yields new stability results for medial axes, critical points and critical values of distance functions.

David Cohen-Steiner, INRIA, France
Stability of Boundary Measures

An approach for studying the geometry of a shape is to look at the volume of its parallel bodies. This leads for example to Federer's curvature measures, which generalize usual curvatures to a certain class of possibly singular subsets of Euclidean space. In this talk, I will discuss what happens if one applies the same idea to point cloud data. The main result is an error bound establishing convergence of the obtained measures when the sampling density increases.

Joint work with F. Chazal and Q. Merigot.

Sanjoy Dasgupta, University of California - San Diego
The Random Projection Trees and Low-Dimensional Manifolds

The curse of dimensionality has traditionally been the bane of nonparametric statistics (histograms, kernel density estimation, nearest neighbor search, and so on), as reflected in running times and convergence rates that are exponentially bad in the dimension. This problem is all the more pressing as data sets get increasingly high dimensional.

Recently the field has been rejuvenated in several ways, of which perhaps the most promising is the realization that a lot of real-world data which appears high-dimensional in fact has low "intrinsic" dimension in the sense of lying close to a low-dimensional manifold. In the past few years, there has been a huge interest in learning such manifolds from data, and then using the learned structure to transform the data into a lower-dimensional space where standard statistical methods generically work better.

I'll exhibit a way to benefit from intrinsic low dimensionality without having to go to the trouble of explicitly learning its fine structure. Specifically, I'll present a simple variant of the ubiquitous k-d tree (a spatial data structure widely used in machine learning and statistics) that is provably adaptive to low dimensional structure. We call this a "random projection tree" (RP tree).

Along the way, I'll discuss different notions of intrinsic dimension -- motivated by manifolds, by local statistics, and by analysis on metric spaces -- and relate them. I'll then prove that RP trees require resources that depend only on these dimensions rather than the dimension of the space in which the data happens to be situated.

This is work with Yoav Freund (UC San Diego).

Vin de Silva, Pomona College
Witness Complexes and Bicomplexes in Topological Approximation

Abstract: Witness complexes were devised as a method for approximating the topology of a point-cloud data set. The data set is represented either as a single simplicial complex, or (more usefully) as a nested family of simplicial complexes, to which the theory of persistence can be applied. Several groups of researchers have established correctness theorems in this static (0-parameter) setting. Gunnar Carlsson and I have recently been studying 1-parameter families of witness complexes. These can be generated from 1-parameter families of data sets, or by varying an internal parameter in the standard witness complex construction (the number of landmarks, say). We propose a strategy for studying such families. I will present this strategy from the point of view of two new algebraic technical tools: (i) witness bicomplexes; and (ii) a generalised persistent homology. --

Tamal Dey, Ohio State University
Voronoi Diagrams in Estimating Geometry and Topology From Data

Abstract: Given a set of points P presumably sampled from a surface S in R^3, many geometric and topological properties can be estimated from the Voronoi diagram of the point set if P samples S well. For example, normals to the surface can be estimated using poles (as defined by Amenta and Bern); a piecewise linear surface isotopic to S with a small Hausdorff distance (relative to local feature sizes) can be computed using algorithms such as Crust and Cocone. These results can be extended to the case when P is noisy but still samples S well. When S is a smooth manifold embedded in an Euclidean space, Voronoi diagrams still capture the essential geometry and the topology of S though some new observations are required. This talk will survey these results.

Tom Duchamp, University of Washington
Manifold Estimation

I will discuss some work of the 3D photography group at the University of Washington on the "manifold estimation problem": given a point cloud in a neighborhood of an (unknown) submanifold of Euclidean space, estimate the submanifold. Replacing the point cloud by a probability distribution leads to a natural variational problem: find a submanifold M that is a local minimum for the expected square of the distance to M. Critical submanifolds are called "principal submanifolds". Unfortunately principal submanifolds are never local minima. Thus, simple least-squares fitting will not work. However, critical manifolds are local minima if one restricts to low frequency perturbations with respect to a certain second order operator related to the Laplace-Beltrami operator. The cutoff frequency can be estimated in terms of certain transverse statistics. Alternately, one can regularize the problem by adding a smoothing term such as a multiple of harmonic energy. The size of the smoothing parameter k can again be estimated in terms of transverse statistics.

Leonidas Guibas, Stanford University
Reconstruction Using Witness Complexes

The problem of reconstructing manifolds from scattered data finds numerous applications across a number of areas, including image or signal processing, computer graphics, robotics, machine learning, or scientific visualization. It has been extensively studied by the computational geometry community in low dimensions. where reconstruction methods based on the use of the Delaunay triangulation D(S) of the input point set S have been proposed. In these methods, an output simplicial complex is extracted from D(S) that is guaranteed to be topologically equivalent to the underlying manifold M, and geometrically close to M, provided that S satisfies mild sampling conditions. Such methods do not immediately carry over to manifolds of arbitrary dimensions, however, since in this more general setting the input point set may well-sample several manifolds with different topological types. Furthermore, the cost of computing high-dimensional Delaunay triangulations is rather high.

We show how the witness complex, a complex closely related to Daleaunay recently studied by Silva and Carlsson, as well topological persistence ideas, as introduced by Zomorodian, Edelsbrunner, and Letscher, can overcome some of these difficulties. The witness complex construction is largely based on distance comparisons and can be applied in general metric space settings. In the Euclidean case, its efficiency can be guaranteed theoretically, both in terms of complexity and in terms of approximation quality. Some experimental results will also be shown.

This is joint work with Jean-Daniel Boissonnat and Steve Oudot.

Andre Lieutier, Aix-en-Provence, France
Stability and Computation of Topological Invariants of Solids in R^n From Approximations

In this work, one proves that under quite general assumptions one can deduce the topology of a bounded open set in R^n from an Hausdorff approximation of its boundary. For this, one introduces the weak feature size (wfs) that extends for non smooth objects the notion of reach. Roughly speaking, for a bounded open set O, wfs(O) measures the minimal size of the topological features of O and we prove that a wfs(O)/4 Hausdorff approximation of the boundary of O determine its homotopy type.

These results apply to open sets with positive wfs which includes subanalytic open sets and cover many cases encountered in practical applications. The proofs are based upon the study of distance functions to closed sets and their critical points. The notion of critical point is the same as the one used in riemannian geometry and nonsmooth analysis. As an application, one gives a way to compute the homology groups of open sets from noisy samples of points on their boundary. This works raises several open problems. In particular, even if the homotopy type is determined by the approximation, no algorithm is known in general that could produce a realisation (as simplicial complex for example) of this homotopy type.

Mauro Maggioni, Duke University
Analysis of and on Data Sets Through Diffusion Operators

I will discuss how data sets modeled as weighted graphs can be studied through diffusions or random walks on them. These diffusions, and in particular the associated heat kernel, can be used to study certain geometric properties of the graph, as well as finding bi-Lipschitz parametrizations (mapping geodesic distances on the set to Euclidean distances in parameter space) on large chunks of the data, at least when the data is "manifold-like". Moreover, one can associate dictionaries of functions to the diffusion operators, for example Fourier-like basis functions (the eigenfunctions of the Laplacian) or wavelet-like functions (called diffusion wavelets). These dictionaries can be used to analyze and approximate functions on graphs, and to perform learning tasks, or for tasks such as compression and denoising. We present several examples - from semisupervised learning to image denoising and to document organization.

Yuriy Mileyko, Duke University
L_p-Stability of Persistence of Lipschitz Functions

We will discuss two stability results for Lipschitz functions on a triangulable, compact metric space. The first result is formulated in terms of the Wasserstein distance between persistence diagrams of functions; the second is formulated in terms of the total persistences of functions. We will also describe applications of the two results to measuring the similarity between gene expression patterns in the development of arabidopsis and to assessing the periodicity of gene expression time-series in the development of mouse embryos, respectively.

Sayan Mukherjee, Duke University
Predictive Models That Infer Structure and Dependencies

We consider the problem of learning gradients in the supervised setting where there are multiple, related tasks. Gradients provide a natural interpretation to the geometric structure of data, and can assist in problems requiring variable selection and dimension reduction. By extending this idea to the multi-task learning (MTL) environment, we present methods for simultaneously learning structure within each task, and the shared structure across all tasks. This allows for dimension reduction as well as inference of graphical models. An application of this framework to modeling tumor progression will be highlighted. This application will discover which a priori defined pathways are relevant either throughout or in particular steps of progression. Pathway interaction networks are inferred for these relevant pathways over the steps in progression. This is followed by the refinement of genes in relevant pathways to those genes most differentially expressed in particular disease stages. The final analysis infers a gene interaction network for these refined gene sets. We apply this approach to model progression in prostate cancer and melanoma resulting in a deeper understanding of the mechanism of tumorigenesis. I will close the talk with 5 minutes of very preliminary work on probabilistic aspects of persistence homologies.

Partha Niyogi, University of Chicago
A Geometric Perspective on Learning Algorithms

Increasingly, we face machine learning problems in very high dimensional spaces. We proceed with the intuition that although natural data lives in very high dimensions, they have relatively few degrees of freedom. One way to formalize this intuition is to model the data as lying on or near a low dimensional manifold embedded in the high dimensional space. This point of view leads to a new class of algorithms that are "manifold motivated" and a new set of theoretical questions that surround their analysis. A central construction in these algorithms is a graph or simplicial complex that is data-derived and we will relate the geometry of these to the geometry of the underlying manifold. Applications to embedding, clustering, classification, and semi-supervised learning will be considered.

Stephen Smale, Toyota Technical Insitute-Chicago
Thoughts and Perspectives on the Topology of Data.

Some implication of the study of images toward foundational questions of data analysis will be discussed.

Werner Stuetzle, University of Washington
Estimating the Cluster Tree of a Density

The general goal of clustering is to identify distinct groups in a collection of objects. To cast clustering as a statistical problem we regard the feature vectors characterizing the objects as a sample from some unknown probability density. The premise of nonparametric clustering is that groups correspond to modes (local maxima) of this density. The cluster tree summarizes the connectivity structure of the level sets of a density; leaves of the tree correspond to modes of the density. I will define the cluster tree, present methods for its estimating, show examples, and discuss some open problems. This is joint work with Rebecca Nugent (CMU).

Santosh Vempala, Georgia Tech
The Best Manifold Problem:Formulation, Hardness and a Conjecture

Motivated by wireless communication, we consider the problem of assigning lengths to the edges of an unweighted graph so as to capture a given metric as accurately as possible. We show that the problem is hard for rather special cases. On the other hand, a local search algorithm seems to perform well in practice. We present a conjecture on the distortion of such embeddings. Joint work with Varun Kanade.

Shmuel Weinberger, University of Chicago
Computing Homological Invariants From Noisy Data

I will discuss, mainly through examples, some probabilistic aspects of the problem of computing the homology of a smooth manifold either from exact data or from noisy data. This will be based in part on joint work with P. Niyogi and S. Smale and with Y. Baryshnikov.

Keith Worsley, McGill University
The Geometry of Random Fiels in Astrophysics and Brain Mapping

The key tool used in the analysis is the Euler characteristic of the excursion set of a random field. An exact expression for its expectation is found using new results in random field theory involving Lipschitz-Killing curvatures and Jonathan Taylor's Gaussian Kinematic Formula. The heuristic for this comes from an idea in geostatistics and the Nash Embedding Theorem. I will show applications to the density of galaxies, the density of multiple sclerosis lesions and its relation to cortical thickness, and the density of "bubbles" in a face perception experiment and its relation to fMRI data.