Hand-authored systems are being increasingly replaced with deep learning systems. While deep neural networks have excellent performance, there is still a large lack of understanding of how they work. Verification and debugging of deep learning models is necessary when deploying these networks in critical applications.
The goal of this research is to develop techniques for analyzing deep neural networks. In particular, we are developing methods to (a) characterize the input-output behavior of deep learning models (b) craft adversarial examples by exploiting weaknesses in a neural network, (c) strategize improvements and enrich training data, (d) extract rules which may be useful augmentations to existing hand-authored systems, (e) develop debugging and analyzing tools which aid the developer in neural network design, and (f) compress neural networks
(Finer-level details omitted as a couple of papers are currently under review)
The massive size of the datasets occurring in modern machine learning problems necessitates the use of data compression and parallel computation. Data such as those arising from social networks often contain complex hierarchical structure. pMMF is a high performance parallel C++ library that computes a matrix factorization which captures structure at various scales. This can in turn be used for matrix compression and fast determinant and inverse computation, as a preprocessing step for machine learning algorithms and linear solvers. Independently, this method can also be used for constructing a wavelet basis for sparse approximation.Collaborators: Nedelina Teneva and Risi Kondor, University of Chicago, USA
Spectral Clustering is a graph partitioning algorithm, based on a continuous relaxation of the NP-hard combinatorial ratio/normalized cut problem. Existing techniques for partitioning a graph into k clusters are either heuristics or recursive schemes based on spectral clustering. These methods fail when applied on large datasets in the real world. We propose an algorithm which directly minimizes the ratio/normalized cut and leads to a k-way partitioning of the data. We also proposed an algorithm for the difficult sum-of-ratios problem which is of independent interest in the optimization community. We performed an extensive experimental evaluation, including on large real world datasets, illustrating that our method indeed outperforms all existing k-clustering algorithms.Collaborators: Syama Sundar Rangapuram and Matthias Hein, Saarland University, Germany
Multiresolution Matrix Factorization (MMF) was recently introduced as an alternative to the dominant low-rank paradigm in order to capture structure in matrices at multiple different scales. Using ideas from multiresolution analysis (MRA), MMF teased out hierarchical structure in symmetric matrices by constructing a sequence of wavelet bases. While effective for such matrices, there is plenty of data that is more naturally represented as nonsymmetric matrices (e.g. directed graphs), but nevertheless has similar hierarchical structure. In this paper, we explore techniques for extending MMF to any square matrix. We validate our approach on numerous matrix compression tasks, demonstrating its efficacy compared to low-rank methods. Moreover, we also show that a combined low-rank and MMF approach, which amounts to removing a small global-scale component of the matrix and then extracting hierarchical structure from the residual, is even more effective than each of the two complementary methods for matrix compression.
We introduce a novel method that enables parameter-efficient transfer and multitask learning. The basic approach is to allow a model patch - a small set of parameters - to specialize to each task, instead of fine-tuning the last layer or the entire network. For instance, we show that learning a set of scales and biases allows a network to learn a completely different embedding that could be used for different tasks (such as converting an SSD detection model into a 1000-class classification model while reusing 98% of parameters of the feature extractor). Similarly, we show that re-learning the existing low-parameter layers (such as depth-wise convolutions) also improves accuracy significantly. Our approach allows both simultaneous (multi-task) learning as well as sequential transfer learning wherein we adapt pretrained networks to solve new problems. For multi-task learning, despite using much fewer parameters than traditional logits-only fine-tuning, we match single-task-based performance.
We analyze state-of-the-art deep learning models for three tasks: question answering on (1) images, (2) tables, and (3) passages of text. Using the notion of attribution (word importance), we find that these deep networks often ignore important question terms. Leveraging such behavior, we perturb questions to craft a variety of adversarial examples. Our strongest attacks drop the accuracy of a visual question answering model from 61.1% to 19%, and that of a tabular question answering model from 33.5% to 3.3%. Additionally, we show how attributions can strengthen attacks proposed by Jia and Liang (2017) on paragraph comprehension models. Our results demonstrate that attributions can augment standard measures of accuracy and empower investigation of model performance. When a model is accurate but for the wrong reasons, attributions can surface erroneous logic in the model that indicates inadequacies in the test data.
We study the current best model (KDG) for question answering on tabular data evaluated over the WikiTableQuestions dataset. Previous ablation studies performed against this model attributed the model's performance to certain aspects of its architecture. In this paper, we find that the model's performance also crucially depends on a certain pruning of the data used to train the model. Disabling the pruning step drops the accuracy of the model from 43.3% to 36.3%. The large impact on the performance of the KDG model suggests that the pruning may be a useful pre-processing step in training other semantic parsers as well.
We introduce a new general purpose multiresolution preconditioner for symmetric linear systems. Most existing multiresolution preconditioners use some standard wavelet basis that relies on knowledge of the geometry of the underlying domain. In constrast, based on the recently proposed Multiresolution Matrix Factorization (MMF) algorithm, we construct a preconditioner that discovers a custom wavelet basis adapted to the given linear system without making any geometric assumptions. Some advantages of the new approach are fast preconditioner-vector products, invariance to the ordering of the rows/columns, and the ability to handle systems of any size. Numerical experiments on finite difference discretizations of model PDEs and off-the-shelf matrices illustrate the effectiveness of the MMF preconditioner.
Multiresolution Matrix Factorization (MMF) is a recently introduced method for finding multiscale structure and defining wavelets on graphs and matrices. MMF can also be used for matrix compression (sketching). However, the original MMF algorithm of (Kondor et al., 2014) scales with n3 or worse (where n is the number of rows/columns in the matrix to be factorized) making it infeasible for large scale problems. In this paper we describe pMMF, a fast parallel MMF algorithm, which can scale to n in the range of millions. Our experimental results show that when used for matrix compression, pMMF often achieves much lower error than competing algorithms (especially on network data), yet for sparse matrices its running time scales close to linearly with n.
Multiresolution Matrix Factorization (MMF) was recently introduced as a method for finding multiscale structure and defining wavelets on graphs/matrices. In this paper we derive pMMF, a parallel algorithm for computing the MMF factorization. Empirically, the running time of pMMF scales linearly in the dimension for sparse matrices. We argue that this makes pMMF a valuable new computational primitive in its own right, and present experiments on using pMMF for two distinct purposes: compressing matrices and preconditioning large sparse linear systems.
Spectral Clustering as a relaxation of the normalized/ratio cut has become one of the standard graph-based clustering methods. Existing methods for the computation of multiple clusters, corresponding to a balanced k-cut of the graph, are either based on greedy techniques or heuristics which have weak connection to the original motivation of minimizing the normalized cut. In this paper we propose a new tight continuous relaxation for any balanced k-cut problem and show that a related recently proposed relaxation is in most cases loose leading to poor performance in practice. For the optimization of our tight continuous relaxation we propose a new algorithm for the hard sum-of-ratios minimization problem which achieves monotonic descent. Extensive comparisons show that our method beats all existing approaches for ratio cut and other balanced k-cut criteria.
We present Oqtans, an open-source workbench for quantitative transcriptome analysis, that is integrated in Galaxy. Its distinguishing features include customizable computational workflows and a modular pipeline architecture that facilitates comparative assessment of tool and data quality. Oqtans integrates an assortment of machine learning-powered tools into Galaxy, which show superior or equal performance to state-of-the-art tools. Implemented tools comprise a complete transcriptome analysis workflow: short-read alignment, transcript identification/quantification and differential expression analysis. Oqtans and Galaxy facilitate persistent storage, data exchange and documentation of intermediate results and analysis workflows. We illustrate how Oqtans aids the interpretation of data from different experiments in easy to understand use cases. Users can easily create their own workflows and extend Oqtans by integrating specific tools. Oqtans is available as (i) a cloud machine image with a demo instance at cloud.oqtans.org, (ii) a public Galaxy instance at galaxy.cbio.mskcc.org, (iii) a git repository containing all installed software (oqtans.org/git); most of which is also available from (iv) the Galaxy Toolshed and (v) a share string to use along with Galaxy CloudMan.
Recent advances in high-throughput cDNA sequencing (RNA-Seq) technology have revolutionized transcriptome studies. A major motivation for RNA-Seq is to map the structure of expressed transcripts at nucleotide resolution. With accurate computational tools for transcript reconstruction, this technology may also become useful for genome (re-)annotation, which has mostly relied on de novo gene finding where gene structures are primarily inferred from the genome sequence. We developed a machine-learning method, called mTim (margin-based transcript inference method) for transcript reconstruction from RNA-Seq read alignments that is based on discriminatively trained hidden Markov support vector machines. In addition to features derived from read alignments, it utilizes characteristic genomic sequences, e.g. around splice sites, to improve transcript predictions. mTim inferred transcripts that were highly accurate and relatively robust to alignment errors in comparison to those from Cufflinks, a widely used transcript assembly method.