Machine Learning Ph.D. Student

The University of Chicago

`pramodkm@uchicago.edu`

CV Publications

- analyzing what deep neural networks actually learn, and leveraging those insights to discover their weaknesses or to improve them
- multiresolution/multiscale matrix factorizations that express hierarchical structure

I earned my M.Sc. in Computer Science from Saarland University, Germany where my thesis was on designing an efficient algorithm for

I am an alumnus of IIT Bombay, India where I got my undergraduate degree in Computer Science.

Hand-authored systems are being increasingly replaced with deep learning systems. While deep neural networks have excellent performance, there is still a large lack of understanding of how they work. Verification and debugging of deep learning models is necessary when deploying these networks in critical applications.

The goal of this research is to develop techniques for analyzing deep neural networks. In particular, we are developing methods to (a) characterize the input-output behavior of deep learning models (b) craft adversarial examples by exploiting weaknesses in a neural network, (c) strategize improvements and enrich training data, (d) extract rules which may be useful augmentations to existing hand-authored systems, (e) develop debugging and analyzing tools which aid the developer in neural network design, and (f) compress neural networks

(Finer-level details omitted as a couple of papers are currently under review)

Collaborators: Ankur Taly, Mukund Sundararajan, Kedar Dhamdhere and Kevin McCurley

Relevant publications:

- Did the Model Understand the Question?, ACL 2018 (main conference)
- It Was the Training Data Pruning Too!, arXiv preprint 2018

The massive size of the datasets occurring in modern machine learning problems necessitates the use of data compression and parallel computation. Data such as those arising from social networks often contain complex hierarchical structure. pMMF is a high performance parallel C++ library that computes a matrix factorization which captures structure at various scales. This can in turn be used for matrix compression and fast determinant and inverse computation, as a preprocessing step for machine learning algorithms and linear solvers. Independently, this method can also be used for constructing a wavelet basis for sparse approximation.

Collaborators: Nedelina Teneva and Risi Kondor, University of Chicago, USARelevant publications:

- A Generic Multiresolution Preconditioner for Sparse Symmetric Systems, arXiv preprint 2017
- Multiresolution Matrix Compression, AISTATS 2016 notable student paper award
- Parallel MMF: a Multiresolution Approach to Matrix Computation, NIPS 2015 demonstration

Spectral Clustering is a graph partitioning algorithm, based on a continuous relaxation of the NP-hard combinatorial ratio/normalized cut problem. Existing techniques for partitioning a graph into k clusters are either heuristics or recursive schemes based on spectral clustering. These methods fail when applied on large datasets in the real world. We propose an algorithm which directly minimizes the ratio/normalized cut and leads to a k-way partitioning of the data. We also proposed an algorithm for the difficult sum-of-ratios problem which is of independent interest in the optimization community. We performed an extensive experimental evaluation, including on large real world datasets, illustrating that our method indeed outperforms all existing k-clustering algorithms.

Collaborators: Syama Sundar Rangapuram and Matthias Hein, Saarland University, GermanyRelevant publications:

K for the price of 1: parameter-efficient multitask and transfer learning

ICLR 2019NeurIPS 2018 Workshop on Continual Learning

Pramod Kaushik Mudrakarta, Mark Sandler, Andrey Zhmoginov, Andrew Howard

PDF
Abstract
We introduce a novel method that enables parameter-efficient transfer and multitask learning. The basic approach is to allow a model patch - a small set of parameters - to specialize to each task, instead of fine-tuning the last layer or the entire network. For instance, we show that learning a set of scales and biases allows a network to learn a completely different embedding that could be used for different tasks (such as converting an SSD detection model into a 1000-class classification model while reusing 98% of parameters of the feature extractor). Similarly, we show that re-learning the existing low-parameter layers (such as depth-wise convolutions) also improves accuracy significantly. Our approach allows both simultaneous (multi-task) learning as well as sequential transfer learning wherein we adapt pretrained networks to solve new problems. For multi-task learning, despite using much fewer parameters than traditional logits-only fine-tuning, we match single-task-based performance.

Did the model understand the question?

ACL 2018Pramod Kaushik Mudrakarta, Ankur Taly, Mukund Sundararajan, Kedar Dhamdhere

PDF
Abstract
We analyze state-of-the-art deep learning models for three tasks: question answering on (1) images, (2) tables, and (3) passages of text. Using the notion of *attribution* (word importance), we find that these deep networks often ignore important question terms. Leveraging such behavior, we perturb questions to craft a variety of adversarial examples. Our strongest attacks drop the accuracy of a visual question answering model from 61.1% to 19%, and that of a tabular question answering model from 33.5% to 3.3%. Additionally, we show how attributions can strengthen attacks proposed by Jia and Liang (2017) on paragraph comprehension models. Our results demonstrate that attributions can augment standard measures of accuracy and empower investigation of model performance. When a model is accurate but for the wrong reasons, attributions can surface erroneous logic in the model that indicates inadequacies in the test data.

It was the training data pruning too!

arXiv preprint 2018 Pramod Kaushik Mudrakarta, Ankur Taly, Mukund Sundararajan, and Kedar Dhamdhere

PDF
Abstract
We study the current best model (KDG) for question answering on tabular data evaluated over the WikiTableQuestions dataset. Previous ablation studies performed against this model attributed the model's performance to certain aspects of its architecture. In this paper, we find that the model's performance also crucially depends on a certain pruning of the data used to train the model. Disabling the pruning step drops the accuracy of the model from 43.3% to 36.3%. The large impact on the performance of the KDG model suggests that the pruning may be a useful pre-processing step in training other semantic parsers as well.

A generic multiresolution preconditioner for sparse symmetric systems

arXiv preprint 2017 Pramod Kaushik Mudrakarta and Risi Kondor

PDF
Abstract
We introduce a new general purpose multiresolution preconditioner for symmetric linear systems. Most existing multiresolution preconditioners use some standard wavelet basis that relies on knowledge of the geometry of the underlying domain. In constrast, based on the recently proposed Multiresolution Matrix Factorization (MMF) algorithm, we construct a preconditioner that discovers a custom wavelet basis adapted to the given linear system without making any geometric assumptions. Some advantages of the new approach are fast preconditioner-vector products, invariance to the ordering of the rows/columns, and the ability to handle systems of any size. Numerical experiments on finite difference discretizations of model PDEs and off-the-shelf matrices illustrate the effectiveness of the MMF preconditioner.

Multiresolution matrix compression

AISTATS 2016
Notable student paper award (given to top 3 papers)
Nedelina Teneva, Pramod Kaushik Mudrakarta and Risi Kondor

PDF
Abstract
Multiresolution Matrix Factorization (MMF) is a recently introduced method for finding multiscale structure and defining wavelets on graphs and matrices. MMF can also be used for matrix compression (sketching). However, the original MMF algorithm of (Kondor et al., 2014) scales with `n ^{3}` or worse (where

Parallel MMF: a multiresolution approach to matrix computation

arXiv preprint 2015Risi Kondor, Nedelina Teneva and Pramod Kaushik Mudrakarta

PDF
Abstract
Multiresolution Matrix Factorization (MMF) was recently introduced as a method for finding multiscale structure and defining wavelets on graphs/matrices. In this paper we derive pMMF, a parallel algorithm for computing the MMF factorization. Empirically, the running time of pMMF scales linearly in the dimension for sparse matrices. We argue that this makes pMMF a valuable new computational primitive in its own right, and present experiments on using pMMF for two distinct purposes: compressing matrices and preconditioning large sparse linear systems.

Tight continuous relaxation of the balanced `k`-cut problem

NIPS 2014
Syama Sundar Rangapuram, Pramod Kaushik Mudrakarta and Matthias Hein

PDF
Abstract
Spectral Clustering as a relaxation of the normalized/ratio cut has become one of the standard graph-based clustering methods. Existing methods for the computation of multiple clusters, corresponding to a balanced `k`-cut of the graph, are either based on greedy techniques or heuristics which have weak connection to the original motivation of minimizing the normalized cut. In this paper we propose a new tight continuous relaxation for any balanced `k`-cut problem and show that a related recently proposed relaxation is in most cases loose leading to poor performance in practice. For the optimization of our tight continuous relaxation we propose a new algorithm for the hard sum-of-ratios minimization problem which achieves monotonic descent. Extensive comparisons show that our method beats all existing approaches for ratio cut and other balanced `k`-cut criteria.

Oqtans: the RNA-seq workbench in the cloud for complete and reproducible quantitative transcriptome analysis

Bioinformatics 2014
Vipin T. Sreedharan, Sebastian J. Schultheiss, Géraldine Jean, André Kahles, Regina Bohnert, Philipp Drewe, Pramod Kaushik Mudrakarta, Nico Görnitz, Georg Zeller, Gunnar Rätsch

Link to journal page
Abstract
We present Oqtans, an open-source workbench for quantitative transcriptome analysis, that is integrated in Galaxy. Its distinguishing features include customizable computational workflows and a modular pipeline architecture that facilitates comparative assessment of tool and data quality. Oqtans integrates an assortment of machine learning-powered tools into Galaxy, which show superior or equal performance to state-of-the-art tools. Implemented tools comprise a complete transcriptome analysis workflow: short-read alignment, transcript identification/quantification and differential expression analysis. Oqtans and Galaxy facilitate persistent storage, data exchange and documentation of intermediate results and analysis workflows. We illustrate how Oqtans aids the interpretation of data from different experiments in easy to understand use cases. Users can easily create their own workflows and extend Oqtans by integrating specific tools. Oqtans is available as (i) a cloud machine image with a demo instance at cloud.oqtans.org, (ii) a public Galaxy instance at galaxy.cbio.mskcc.org, (iii) a git repository containing all installed software (oqtans.org/git); most of which is also available from (iv) the Galaxy Toolshed and (v) a share string to use along with Galaxy CloudMan.

mTim: rapid and accurate transcript reconstruction from RNA-seq data

arXiv preprint 2013
Georg Zeller, Nico Görnitz, André Kahles, Jonas Behr, Pramod Kaushik Mudrakarta, Sören Sonnenburg, Gunnar Rätsch

PDF
Abstract
Recent advances in high-throughput cDNA sequencing (RNA-Seq) technology have revolutionized transcriptome studies. A major motivation for RNA-Seq is to map the structure of expressed transcripts at nucleotide resolution. With accurate computational tools for transcript reconstruction, this technology may also become useful for genome (re-)annotation, which has mostly relied on de novo gene finding where gene structures are primarily inferred from the genome sequence. We developed a machine-learning method, called mTim (margin-based transcript inference method) for transcript reconstruction from RNA-Seq read alignments that is based on discriminatively trained hidden Markov support vector machines. In addition to features derived from read alignments, it utilizes characteristic genomic sequences, e.g. around splice sites, to improve transcript predictions. mTim inferred transcripts that were highly accurate and relatively robust to alignment errors in comparison to those from Cufflinks, a widely used transcript assembly method.