BriefGPT.xyz
Jun, 2018
自解释神经网络的稳健可解释性研究
Towards Robust Interpretability with Self-Explaining Neural Networks
HTML
PDF
David Alvarez-Melis, Tommi S. Jaakkola
TL;DR
提出了自说明模型的三个特点——显式性,忠诚度和稳定性,旨在落实模型可解释性并实现复杂模型的解释性,通过特定模型的正则化实现忠诚度和稳定性的要求,实验结果表明,该框架为解决模型的复杂性和可解释性困境提供了一个有前途的方向。
Abstract
Most recent work on
interpretability
of complex machine learning models has focused on estimating $\textit{a posteriori}$ explanations for previously trained models around specific predictions. $\textit{Self-explaining}$ models where
→