从模型解释中重建模型

Jul, 2018

Model Reconstruction from Model Explanations

Smitha Milli, Ludwig Schmidt, Anca D. Dragan, Moritz Hardt

TL;DR该研究通过理论和实验表明，基于梯度的模型解释快速揭示模型本身，该结果强调了梯度而不是标签作为学习原语。同时，该研究提出了有效的启发式方法，以重新构建从梯度说明中获得的模型。

Abstract

We show through theory and experiment that gradient-based explanations of a model quickly reveal the model itself. Our results speak to a tension between the desire to keep a proprietary model secret and the abil