Multiresolution Methods for Large Scale Learning
NIPS 2015 workshop, December 12, 2015, Room 511c, Montréal, Canada
Organizers: Inderjit Dhillon, Risi Kondor, Robert Nowak, Michael O'Neil and Nedelina Teneva


9:00-9:40 Pierre Vandergheynst:  Random sampling of bandlimited signals on graphs  [slides]
9:40-10:10 Risi Kondor:  Multiresolution Matrix Factorization  [slides]
 
10:30-11:00 Austin Benson:  Beyond Nodes and Edges:  Multiresolution Models of Complex Networks
11:00-11:30 Michael W. Mahoney:  Challenges in Multiresolution Methods for Graph-based Learning  [slides]
11:30-11:50 Ankit B. Patel, Tan Nguyen and Richard G. Baraniuk:  Deep Convolutional Nets as a Multi-Scale Matrix Mixture Factorization
11:50-12:10 William B. March and George Biros:  Hierarchical Decomposition of Kernel Matrices [slides]
 
2:00-2:30 Ilya Safro:  Multigrid-inspired Methods for Networks [slides]
2:30-3:00 Michael O'Neil:  Fast Direct Methods for Gaussian Processes  [slides]
3:00-3:20 Steffen Börm and Jochen Garcke:  Approximating Gaussian Processes with H^2 Matrices  [slides]
3:20-3:40 Kunal Srivastava and Tuhin Sahai:  A Multiresolution Approach for Tensor Factorization
3:40-4:00 Eric Sibony, Anna Korba and Stéphane Clémençon:  Multiresolution analysis for the statistical analysis of incomplete rankings  [slides]
 
4:30-5:00 Francis Bach:  Structured Sparsity and Convex Optimization  [slides]
5:00-5:30 Sayan Mukherjee:  Scaling Phenomena in Stochastic Topology



Session 1:  9:00-10:10

Pierre Vandergheynst (EPFL):  Random sampling of bandlimited signals on graphs  [slides]
We study the problem of sampling k-bandlimited signals on graphs, as models for information over networks. We propose two sampling strategies that consist in selecting a small subset of nodes at random. The first strategy is non-adaptive, i.e., independent of the graph structure, and its performance depends on a parameter called the graph coherence. On the contrary, the second strategy is adaptive but yields optimal results. Indeed, no more than O(k log(k)) measurements are sufficient to ensure an accurate and stable recovery of all k-bandlimited signals. This second strategy is based on a careful choice of the sampling distribution, which can be estimated quickly. Then, we propose a computationally efficient decoder to reconstruct k-bandlimited signals from their samples. We prove that it yields accurate reconstructions and that it is also stable to noise. Finally, we illustrate this technique by showing one can use it to sketch networked data and efficiently approximate spectral clustering. (Joint work with Gilles Puy, Nicolas Tremblay and Rémi Gribonval)

Risi Kondor (UChicago):  Multiresolution Matrix Factorization  [slides]
The sheer size of today's datasets dictates that learning algorithms compress or reduce their input data and/or make use of parallelism. Multiresolution Matrix Factorization (MMF) makes a connection between such computational strategies and some classical themes in Applied Mathematics, namely Multiresolution Analysis and Multigrid Methods. In particular, the similarity matrices appearing in data often have multiresolution structure, which can be exploited both for learning and to facilitate computation.
In addition to the general MMF framework, I will present our new, high performance parallel MMF software library and show some results on matrix compression/sketching tasks (joint work with Nedelina Teneva, Pramod Mudrakarta and Vikas Garg).



Session 2:  10:30-12:10


Austin Benson (Stanford):  Beyond Nodes and Edges: Multiresolution Models of Complex Networks 
Networks are a fundamental tool for understanding and modeling complex systems in physics, biology, neuro, and social sciences. Present network algorithms are almost exclusively focusing on first-order, or edge-based, structures in networks. However, what is missing from the picture are methods for analyzing higher-order organization of complex networks.
Here we present a generalized framework for a network clustering methodology based on higher-order network connectivity patterns. This framework allows for identifying rich higher-order clusters in networks. Our framework scales to networks with billions of edges and provides mathematical guarantees on the optimality of obtained clusters. We apply our framework to networks from a variety of scientific domains with scales ranging from a few hundred to over one billion links. In neuronal networks, our framework identifies inter-connection regions; in food webs, it provides a ecological-layer decomposition; and in transportation networks, it detects well-known airport hubs.

Michael W. Mahoney (Berkeley):  Challenges in Multiresolution Methods for Graph-based Learning  [slides]
Graphs are popular ways to model data, and many machine learning methods have been developed to identify structure in graph-based data. For example, it is popular to assume that there is some sort of small-scale local structure and that this structure combines to form some sort of large-scale global structure. In many cases, these multiresolution methods are motivated by or boil down to variants of methods developed in scientific computing for very structured partial differential equations or low-dimensional geometric problems. On the other hand, realistic informatics graphs are typically much less well structured, and this creates serious challenges for the development of multiresolution methods for machine learning and data analysis problems. Several challenges in this area will be discussed.

Ankit B. Patel, Tan Nguyen and Richard G. Baraniuk:  Deep Convolutional Nets as a Multi-Scale Matrix Mixture Factorization 

William March and George Biros (UTexas):  Hierarchical Decomposition of Kernel Matrices  [slides]
Kernel-based learning methods typically involve an NxN matrix of all pairwise kernel interactions between data points. We describe ASKIT (Approximate Skeletonization Kernel Independent Treecode), an efficient, scalable algorithm for constructing an approximate, hierarchical factorization of this matrix. Our method scales linearly with the number of features, requires only very weak conditions on the kernel function, and can scale to large distributed memory architectures.
We sketch the basic ASKIT algorithm, highlighting our hierarchical decomposition for high-dimensional data and the randomized factorizations used to create the approximation. We then briefly mention our scalable parallel implementation of ASKIT and highlight some empirical results of our algorithm, including results on over 1 billion points in 64 dimensions with a Gaussian kernel.



Session 3:  2:00-4:00


Ilya Safro (Clemson):  Multigrid-inspired Methods for Networks  [slides]
In many real-world problems, a big scale gap can be observed between micro- and macroscopic scales of the problem because of the difference in mathematical (engineering, social, biological, physical, etc.) models and/or laws at different scales. The main objective of multigrid-inspired algorithms is to create a hierarchy of problems, each representing the original problem at different coarse scales with fewer degrees of freedom.
We will discuss different strategies of creating these hierarchies for large-scale discrete optimization problems related to network analysis. These strategies are inspired by the classical multigrid frameworks such as geometric multigrid, algebraic multigrid and full approximation scheme. We will present in details algorithmic frameworks for linear arrangement, network compression, k-partitioning, network generation, sparsification, and epidemics response problems. Time permits, multigrid-inspired algorithm for support vector machines will be presented.

Michael O'Neil (NYU):  Fast Direct Methods for Gaussian Processes  [slides]
The main computational bottleneck when computing with Gaussian processes is the O(n^3) cost for inverting the covariance matrix and calculating its determinant. Using recent developments in fast direct solver technology based on hierarchical matrix decomposition methods, it is often possible to reduce this cost to O(n log n) without sacrificing accuracy or thresholding small covariances. This talk will give an overview of these methods, and describe related on-going research.

Steffen Börm and Jochen Garcke (Uni Bonn):  Approximating Gaussian Processes with H^2-matrices  [slides]
To compute the exact solution of Gaussian process regression, one needs O(n^3) computations for direct and O(n^2) for iterative methods, since it involves a densely populated kernel matrix of size nxn, here n denotes the number of data. This makes large scale learning problems intractable by standard techniques.
We propose to use an alternative approach: the kernel matrix is replaced by a data-sparse approximation, called an H^2-matrix. This matrix can be represented by only O(nm) units of storage, where m is a parameter controlling the accuracy of the approximation, while the computation of the H^2-matrix scales with O(nm log(n)). Practical experiments demonstrate that our scheme leads to significant reductions in storage requirements and computing times for large data sets in lower dimensional spaces.

Kunal Srivastava and Tuhin Sahai:  A Multiresolution Approach for Tensor Factorization 
We develop a multiresolution decomposition framework for symmetric tensors based on the recent work by Kondor et.al. (2014). The goal is to construct a sequence of orthogonal transformations that approximately diagonalizes a given symmetric tensor. The iterative construction also yields a sparse adaptive basis which has a multiresolution wavelet interpretation. We demonstrate the applicability of the developed framework for approximate diagonalization of the fourth order cumulant tensor used in Independent Component Analysis (ICA). Furthermore, we present initial results on computing adaptive multi-scale basis from brain EEG data by utilizing the developed connection to ICA.

Eric Sibony Anna Korba and Stéphane Clémençon (LTCI):  Multiresolution Analysis for the statistical analysis of incomplete rankings  [slides]
In many modern applications, users express preferences over subsets of items of a general catalog of items (tickets, restaurants, videos, songs, products). Such preferences are naturally represented by incomplete rankings and need to be analyzed in a large-scale setting as the catalog typically contains thousands of items. In this talk I will present a novel representation for functions over rankings, established in recent previous works, tailor-made for this task. It takes the form of a Multiresolution Analysis and naturally complements the representation based on harmonic analysis over the symmetric group.



Session 4:  4:30-5:30


Francis Bach (INRIA):  Structured Sparsity and Convex Optimization  [slides]
The concept of parsimony is central in many scientific domains. In the context of statistics, signal processing or machine learning, it takes the form of variable or feature selection problems, and is commonly used in two situations: First, to make the model or the prediction more interpretable or cheaper to use, i.e., even if the underlying problem does not admit sparse solutions, one looks for the best sparse approximation. Second, sparsity can also be used given prior knowledge that the model should be sparse. In these two situations, reducing parsimony to finding models with low cardinality turns out to be limiting, and structured parsimony has emerged as a fruitful practical extension, with applications to image processing, text processing or bioinformatics. In this talk, I will review recent results on structured sparsity, as it applies to machine learning and signal processing, in particular with hierarchical structures (joint work with R. Jenatton, J. Mairal and G. Obozinski).

Sayan Mukherjee (Duke):  Scaling Phenomena in Stochastic Topology 
The graph Laplacian and random walks on graphs has impacted statistics, computer science, and mathematics. I will motivate why it is of interest to extend these graph based algorithms to simplicial complexes, which capture higher-order relations. I will describe recent efforts to define random walks on simplicial complexes with stationary distributions related to the combinatorial (Hodge) Laplacian. This amounts to scaling limits of diffusion processes. This work will touch on higher-order Cheeger inequalities, an extension of label propagation to edges or higher-order complexes, and a generalization of results for near linear time solutions for linear systems.
The second part of the talk focuses on scaling limits of point processes on manifolds. Given n points down from a point process on a manifold, consider the random set which consists of the union of balls of radius r around the points. As n goes to infinity, r is sent to zero at varying rates. For this stochastic process, I will provide scaling limits and phase transitions on the counts of Betti numbers and critical points.