BriefGPT.xyz
Feb, 2025
神经网络模块化训练有助于可解释性
Modular Training of Neural Networks aids Interpretability
HTML
PDF
Satvik Golechha, Maheep Chaudhary, Joan Velja, Alessandro Abate, Nandi Schoots
TL;DR
本研究解决了神经网络可解释性不足的问题,通过提出“集群可用性损失”函数来促进模型的模块化训练,从而形成相互独立的集群。研究表明该方法能够训练出更模块化的模型,使其学习到不同且更简单的功能,显著提升了可解释性。
Abstract
An approach to improve neural network
Interpretability
is via
Clusterability
, i.e., splitting a model into disjoint clusters that can be studied independently. We define a measure for
→