With Great Training Comes Great Vulnerability: Practical Attacks against Transfer Learning

Bolun Wang
Yuanshun Yao
Bimal Viswanath
Haitao Zheng
Ben Y. Zhao

Proceedings of the 27th USENIX Security Symposium (USENIX Security 2018)

Paper Abstract

Transfer learning is a powerful approach that allows users to quickly build accurate deep-learning (Student) models by "learning" from centralized (Teacher) models pretrained with large datasets, e.g. Google's InceptionV3. We hypothesize that the centralization of model training increases their vulnerability to misclassification attacks leveraging knowledge of publicly accessible Teacher models. In this paper, we describe our efforts to understand and experimentally validate such attacks in the context of image recognition. We identify techniques that allow attackers to associate Student models with their Teacher counterparts, and launch highly effective misclassification attacks on black-box Student models. We validate this on widely used Teacher models in the wild. Finally, we propose and evaluate multiple approaches for defense, including a neuron-distance technique that successfully defends against these attacks while also obfuscates the link between Teacher and Student models.