Pramod Kaushik Mudrakarta

Machine Learning @ Google Research

Email me at

CV Publications


Hello there! I am working at Google on improving ad experience for users, advertisers as well as creators on YouTube using Machine Learning. I finished my Ph.D. in Machine Learning from the University of Chicago where I was advised by the excellent Prof. Risi Kondor. My thesis explored three challenges in modern machine learning: exploiting multiscale/multiresolution structure in data, interpretability of deep neural networks, and parameter-efficient in the deployment of deep neural networks on embedded devices.

I earned my M.Sc. in Computer Science from Saarland University, Germany where my thesis was on designing an efficient algorithm for k-way balanced graph cuts (in other words: multi-class clustering). Before that, I worked as a Research Assistant and Scientist at the Max-Planck Institute in Tübingen, Germany in Dr. Gunnar Rätsch's bioinformatics lab.

I am an alumnus of IIT Bombay, India where I got my undergraduate degree in Computer Science.


Analyzing deep neural networks

Hand-authored systems are being increasingly replaced with deep learning systems. While deep neural networks have excellent performance, there is still a large lack of understanding of how they work. Verification and debugging of deep learning models is necessary when deploying these networks in critical applications.
   The goal of this research is to develop techniques for analyzing deep neural networks. In particular, we are developing methods to (a) characterize the input-output behavior of deep learning models (b) craft adversarial examples by exploiting weaknesses in a neural network, (c) strategize improvements and enrich training data, (d) extract rules which may be useful augmentations to existing hand-authored systems, (e) develop debugging and analyzing tools which aid the developer in neural network design, and (f) compress neural networks
(Finer-level details omitted as a couple of papers are currently under review)

This work began during my summer internship at Google Brain.

Collaborators: Ankur Taly, Mukund Sundararajan, Kedar Dhamdhere and Kevin McCurley

Relevant publications:

Parallel multiresolution matrix factorization (pMMF) Project page

The massive size of the datasets occurring in modern machine learning problems necessitates the use of data compression and parallel computation. Data such as those arising from social networks often contain complex hierarchical structure. pMMF is a high performance parallel C++ library that computes a matrix factorization which captures structure at various scales. This can in turn be used for matrix compression and fast determinant and inverse computation, as a preprocessing step for machine learning algorithms and linear solvers. Independently, this method can also be used for constructing a wavelet basis for sparse approximation.

Collaborators: Nedelina Teneva and Risi Kondor, University of Chicago, USA

Relevant publications:

Balanced k-way graph clustering:

Spectral Clustering is a graph partitioning algorithm, based on a continuous relaxation of the NP-hard combinatorial ratio/normalized cut problem. Existing techniques for partitioning a graph into k clusters are either heuristics or recursive schemes based on spectral clustering. These methods fail when applied on large datasets in the real world. We propose an algorithm which directly minimizes the ratio/normalized cut and leads to a k-way partitioning of the data. We also proposed an algorithm for the difficult sum-of-ratios problem which is of independent interest in the optimization community. We performed an extensive experimental evaluation, including on large real world datasets, illustrating that our method indeed outperforms all existing k-clustering algorithms.

Collaborators: Syama Sundar Rangapuram and Matthias Hein, Saarland University, Germany

Relevant publications:


Google Scholar profile

Asymmetric Multiresolution Matrix Factorization
arXiv preprint 2019
Pramod Kaushik Mudrakarta, Shubhendu Trivedi, Risi Kondor
PDF Abstract

Multiresolution Matrix Factorization (MMF) was recently introduced as an alternative to the dominant low-rank paradigm in order to capture structure in matrices at multiple different scales. Using ideas from multiresolution analysis (MRA), MMF teased out hierarchical structure in symmetric matrices by constructing a sequence of wavelet bases. While effective for such matrices, there is plenty of data that is more naturally represented as nonsymmetric matrices (e.g. directed graphs), but nevertheless has similar hierarchical structure. In this paper, we explore techniques for extending MMF to any square matrix. We validate our approach on numerous matrix compression tasks, demonstrating its efficacy compared to low-rank methods. Moreover, we also show that a combined low-rank and MMF approach, which amounts to removing a small global-scale component of the matrix and then extracting hierarchical structure from the residual, is even more effective than each of the two complementary methods for matrix compression.

K for the price of 1: parameter-efficient multitask and transfer learning
ICLR 2019
NeurIPS 2018 Workshop on Continual Learning
Pramod Kaushik Mudrakarta, Mark Sandler, Andrey Zhmoginov, Andrew Howard
PDF Abstract

We introduce a novel method that enables parameter-efficient transfer and multitask learning. The basic approach is to allow a model patch - a small set of parameters - to specialize to each task, instead of fine-tuning the last layer or the entire network. For instance, we show that learning a set of scales and biases allows a network to learn a completely different embedding that could be used for different tasks (such as converting an SSD detection model into a 1000-class classification model while reusing 98% of parameters of the feature extractor). Similarly, we show that re-learning the existing low-parameter layers (such as depth-wise convolutions) also improves accuracy significantly. Our approach allows both simultaneous (multi-task) learning as well as sequential transfer learning wherein we adapt pretrained networks to solve new problems. For multi-task learning, despite using much fewer parameters than traditional logits-only fine-tuning, we match single-task-based performance.

Did the model understand the question?
ACL 2018
Pramod Kaushik Mudrakarta, Ankur Taly, Mukund Sundararajan, Kedar Dhamdhere
PDF Abstract

We analyze state-of-the-art deep learning models for three tasks: question answering on (1) images, (2) tables, and (3) passages of text. Using the notion of attribution (word importance), we find that these deep networks often ignore important question terms. Leveraging such behavior, we perturb questions to craft a variety of adversarial examples. Our strongest attacks drop the accuracy of a visual question answering model from 61.1% to 19%, and that of a tabular question answering model from 33.5% to 3.3%. Additionally, we show how attributions can strengthen attacks proposed by Jia and Liang (2017) on paragraph comprehension models. Our results demonstrate that attributions can augment standard measures of accuracy and empower investigation of model performance. When a model is accurate but for the wrong reasons, attributions can surface erroneous logic in the model that indicates inadequacies in the test data.

It was the training data pruning too!
arXiv preprint 2018
Pramod Kaushik Mudrakarta, Ankur Taly, Mukund Sundararajan, and Kedar Dhamdhere
PDF Abstract

We study the current best model (KDG) for question answering on tabular data evaluated over the WikiTableQuestions dataset. Previous ablation studies performed against this model attributed the model's performance to certain aspects of its architecture. In this paper, we find that the model's performance also crucially depends on a certain pruning of the data used to train the model. Disabling the pruning step drops the accuracy of the model from 43.3% to 36.3%. The large impact on the performance of the KDG model suggests that the pruning may be a useful pre-processing step in training other semantic parsers as well.

A generic multiresolution preconditioner for sparse symmetric systems
arXiv preprint 2017
Pramod Kaushik Mudrakarta and Risi Kondor
PDF Abstract

We introduce a new general purpose multiresolution preconditioner for symmetric linear systems. Most existing multiresolution preconditioners use some standard wavelet basis that relies on knowledge of the geometry of the underlying domain. In constrast, based on the recently proposed Multiresolution Matrix Factorization (MMF) algorithm, we construct a preconditioner that discovers a custom wavelet basis adapted to the given linear system without making any geometric assumptions. Some advantages of the new approach are fast preconditioner-vector products, invariance to the ordering of the rows/columns, and the ability to handle systems of any size. Numerical experiments on finite difference discretizations of model PDEs and off-the-shelf matrices illustrate the effectiveness of the MMF preconditioner.

Multiresolution matrix compression
AISTATS 2016 Notable student paper award (given to top 3 papers)
Nedelina Teneva, Pramod Kaushik Mudrakarta and Risi Kondor
PDF Abstract

Multiresolution Matrix Factorization (MMF) is a recently introduced method for finding multiscale structure and defining wavelets on graphs and matrices. MMF can also be used for matrix compression (sketching). However, the original MMF algorithm of (Kondor et al., 2014) scales with n3 or worse (where n is the number of rows/columns in the matrix to be factorized) making it infeasible for large scale problems. In this paper we describe pMMF, a fast parallel MMF algorithm, which can scale to n in the range of millions. Our experimental results show that when used for matrix compression, pMMF often achieves much lower error than competing algorithms (especially on network data), yet for sparse matrices its running time scales close to linearly with n.

Parallel MMF: a multiresolution approach to matrix computation
arXiv preprint 2015
Risi Kondor, Nedelina Teneva and Pramod Kaushik Mudrakarta
PDF Abstract

Multiresolution Matrix Factorization (MMF) was recently introduced as a method for finding multiscale structure and defining wavelets on graphs/matrices. In this paper we derive pMMF, a parallel algorithm for computing the MMF factorization. Empirically, the running time of pMMF scales linearly in the dimension for sparse matrices. We argue that this makes pMMF a valuable new computational primitive in its own right, and present experiments on using pMMF for two distinct purposes: compressing matrices and preconditioning large sparse linear systems.

Tight continuous relaxation of the balanced k-cut problem
NIPS 2014
Syama Sundar Rangapuram, Pramod Kaushik Mudrakarta and Matthias Hein
PDF Abstract

Spectral Clustering as a relaxation of the normalized/ratio cut has become one of the standard graph-based clustering methods. Existing methods for the computation of multiple clusters, corresponding to a balanced k-cut of the graph, are either based on greedy techniques or heuristics which have weak connection to the original motivation of minimizing the normalized cut. In this paper we propose a new tight continuous relaxation for any balanced k-cut problem and show that a related recently proposed relaxation is in most cases loose leading to poor performance in practice. For the optimization of our tight continuous relaxation we propose a new algorithm for the hard sum-of-ratios minimization problem which achieves monotonic descent. Extensive comparisons show that our method beats all existing approaches for ratio cut and other balanced k-cut criteria.

Oqtans: the RNA-seq workbench in the cloud for complete and reproducible quantitative transcriptome analysis
Bioinformatics 2014
Vipin T. Sreedharan, Sebastian J. Schultheiss, Géraldine Jean, André Kahles, Regina Bohnert, Philipp Drewe, Pramod Kaushik Mudrakarta, Nico Görnitz, Georg Zeller, Gunnar Rätsch
Link to journal page Abstract

We present Oqtans, an open-source workbench for quantitative transcriptome analysis, that is integrated in Galaxy. Its distinguishing features include customizable computational workflows and a modular pipeline architecture that facilitates comparative assessment of tool and data quality. Oqtans integrates an assortment of machine learning-powered tools into Galaxy, which show superior or equal performance to state-of-the-art tools. Implemented tools comprise a complete transcriptome analysis workflow: short-read alignment, transcript identification/quantification and differential expression analysis. Oqtans and Galaxy facilitate persistent storage, data exchange and documentation of intermediate results and analysis workflows. We illustrate how Oqtans aids the interpretation of data from different experiments in easy to understand use cases. Users can easily create their own workflows and extend Oqtans by integrating specific tools. Oqtans is available as (i) a cloud machine image with a demo instance at, (ii) a public Galaxy instance at, (iii) a git repository containing all installed software (; most of which is also available from (iv) the Galaxy Toolshed and (v) a share string to use along with Galaxy CloudMan.

mTim: rapid and accurate transcript reconstruction from RNA-seq data
arXiv preprint 2013
Georg Zeller, Nico Görnitz, André Kahles, Jonas Behr, Pramod Kaushik Mudrakarta, Sören Sonnenburg, Gunnar Rätsch
PDF Abstract

Recent advances in high-throughput cDNA sequencing (RNA-Seq) technology have revolutionized transcriptome studies. A major motivation for RNA-Seq is to map the structure of expressed transcripts at nucleotide resolution. With accurate computational tools for transcript reconstruction, this technology may also become useful for genome (re-)annotation, which has mostly relied on de novo gene finding where gene structures are primarily inferred from the genome sequence. We developed a machine-learning method, called mTim (margin-based transcript inference method) for transcript reconstruction from RNA-Seq read alignments that is based on discriminatively trained hidden Markov support vector machines. In addition to features derived from read alignments, it utilizes characteristic genomic sequences, e.g. around splice sites, to improve transcript predictions. mTim inferred transcripts that were highly accurate and relatively robust to alignment errors in comparison to those from Cufflinks, a widely used transcript assembly method.



GitHub LinkedIn Twitter Facebook