BriefGPT.xyz
Mar, 2019
对抗鲁棒性与梯度可解释性的桥梁
Bridging Adversarial Robustness and Gradient Interpretability
HTML
PDF
Beomsu Kim, Junghoon Seo, Taegyun Jeon
TL;DR
本文探讨了敌对训练对DNN的梯度提升及其可解释性的影响,发现敌对训练能够使得损失梯度更加符合人类感知,且提出了在测试准确性和损失梯度可解释性之间的权衡以及解决方案。
Abstract
adversarial training
is a training scheme designed to counter adversarial attacks by augmenting the training dataset with adversarial examples. Surprisingly, several studies have observed that
loss gradients
from
→