The adversarial machine learning literature is largely partitioned into evasion attacks on testing data and poisoning attacks on training data. In this work, we show that adversarial examples, originally intended for attacking pre-trained models, are even more effective for data poisoning than recent methods designed specifically for poisoning. Our findings indicate that adversarial examples, when assigned the original label of their natural base image, cannot be used to train a classifier for natural images. Furthermore, when adversarial examples are assigned their adversarial class label, they are useful for training. This suggests that adversarial examples contain useful semantic content, just with the ``wrong'' labels (according to a network, but not a human). Our method, adversarial poisoning, is substantially more effective than existing poisoning methods for secure dataset release, and we release a poisoned version of ImageNet, ImageNet-P, to encourage research into the strength of this form of data obfuscation.

本文研究对数据进行污染的方法，发现对预先训练的模型攻击的对抗性示例比传统攻击方法更有效。在分配正确标签时，对抗性示例包含有用的语义信息，可以用于训练；否则，不能用于训练。该方法与现有方法相比显著提高了安全数据发布的效果，我们发布了毒化版本的ImageNet（ImageNet-P）以鼓励对这种数据混淆形式的研究。

对抗样本是制造强毒药的因素