Post-breach Recovery: Protection against White-box Adversarial Examples for Leaked DNN Models
Shawn Shan
Wenxin Ding
Emily Wenger
Haitao Zheng
Ben Y. Zhao
Proceedings of ACM SIGSAC Conference on Computer and Communications Security (CCS 2022)
[Full Text in PDF Format, 524KB]
Paper Abstract
Server breaches are an unfortunate reality on today's Internet. In the context of deep neural network (DNN) models, they are particularly harmful, because a leaked model gives an attacker ``white-box'' access to generate adversarial examples, a threat model that has no practical robust defenses. For practitioners who have invested years and millions into proprietary DNNs, e.g. medical imaging, this seems like an inevitable disaster looming on the horizon.
In this paper, we consider the problem of post-breach recovery for
DNN models. We propose Neo, a new system that creates new versions of
leaked models, alongside an inference time filter that detects and removes
adversarial examples generated on previously leaked models.
The classification surfaces of different model versions are slightly offset
(by introducing hidden distributions), and Neo detects the overfitting of
attacks to the leaked model used in its generation. We show that across a
variety of tasks and attack methods, Neo is able to filter out
attacks from leaked models with very high accuracy, and provides strong
protection (7--10 recoveries) against attackers who repeatedly breach the
server. Neo performs well against a variety of strong adaptive
attacks, dropping slightly in # of breaches recoverable, and demonstrates
potential as a complement to DNN defenses in the wild.