Blacklight: Scalable Defense for Neural Networks against Query-Based Black-Box Attacks
Huiying Li
Shawn Shan
Emily Wenger
Haitao Zheng
Ben Y. Zhao
Proceedings of 31th USENIX Security Symposium (USENIX Security 2022)
[Full Text in PDF Format, 1.5MB]
Paper Abstract
Deep learning systems are known to be vulnerable to adversarial examples. In particular, query-based black-box attacks do not require knowledge of the deep learning model, but can compute adversarial examples over the network by submitting queries and inspecting returns. Recent work largely improves the efficiency of those attacks, demonstrating their practicality on today's ML-as-a-service platforms.
We propose Blacklight, a new defense against query-based black-box adversarial attacks. Blacklight is
driven by a fundamental insight: to compute adversarial examples, these attacks perform iterative
optimization over the network, producing queries highly similar in the input space. Thus Blacklight detects
query-based black-box attacks by detecting highly similar queries, using an efficient similarity engine
operating on probabilistic content fingerprints. We evaluate Blacklight against eight state-of-the-art
attacks, across a variety of models and image classification tasks. Blacklight identifies them all, often
after only a handful of queries. By rejecting all detected queries, Blacklight prevents any attack from
completing, even when persistent attackers continue to submit queries after banned accounts or rejected
queries. Blacklight is also robust against several powerful counter-measures, including an optimal
black-box attack that approximates white-box attacks in efficiency. Finally, we illustrate how Blacklight
generalizes to other domains like text classification.