视觉电路的自动发现

Apr, 2024

Automatic Discovery of Visual Circuits

Achyuta Rajaram, Neil Chowdhury, Antonio Torralba, Jacob Andreas, Sarah Schwettmann

TL;DR基于视觉概念的神经元激活依赖和功能连接，我们提出了一种新的方法来提取深度视觉模型计算图的子图，从而防御大规模预训练模型的对抗攻击。

Abstract

To date, most discoveries of network subcomponents that implement human-interpretable computations in deep vision models have involved close study of single units and large amounts of human labor. We explore scal