Latent Backdoor Attacks on Deep Neural Networks
Ben Y. Zhao
Proceedings of the 26th ACM Conference on Computer and Communications Security
[Full Text in PDF Format, 2.9MB]
Recent work proposed the concept of backdoor attacks on deep neural networks (DNNs), where misclassification rules are hidden inside normal models, only to be triggered by very specific inputs. However, these "traditional" backdoors assume a context where users train their own models from scratch, which rarely occurs in practice. Instead, users typically customize "Teacher" models already pretrained by providers like Google, through a process called transfer learning. This customization process introduces significant changes to models and disrupts hidden backdoors, greatly reducing the actual impact of backdoors in practice.
In this paper, we describe latent backdoors, a more powerful and
stealthy variant of backdoor attacks that functions under transfer
learning. Latent backdoors are incomplete backdoors embedded into a
"Teacher" model, and automatically inherited by multiple "Student"
models through transfer learning. If any Student models include
the label targeted by the backdoor, then its customization process
completes the backdoor and makes it active. We show that latent
backdoors can be quite effective in a variety of application contexts, and
validate its practicality through real-world attacks against traffic sign
recognition, iris identification of volunteers, and facial recognition of
public figures (politicians). Finally, we evaluate 4 potential defenses,
and find that only one is effective in disrupting latent backdoors, but
might incur a cost in classification accuracy as tradeoff.