As deep learning systems are scaled up to many billions of parameters, relating their internal structure to external behaviors becomes very challenging. Although daunting, this problem is not new: Neuroscientists and cognitive scientists have accumulated decades of experience analyzing a particularly complex system - the brain. In this work, we argue that interpreting both biological and artificial neural systems requires analyzing those systems at multiple levels of analysis, with different analytic tools for each level. We first lay out a joint grand challenge among scientists who study the brain and who study artificial neural networks: understanding how distributed neural mechanisms give rise to complex cognition and behavior. We then present a series of analytical tools that can be used to analyze biological and artificial neural systems, organizing those tools according to Marr's three levels of analysis: computation/behavior, algorithm/representation, and implementation. Overall, the multilevel interpretability framework provides a principled way to tackle neural system complexity; links structure, computation, and behavior; clarifies assumptions and research priorities at each level; and paves the way toward a unified effort for understanding intelligent systems, may they be biological or artificial.

本研究针对深度学习系统内部结构与外部行为之间的关系，提出了一种多层次分析的方法，借鉴了神经科学的丰富经验。通过Marr的三层分析框架，研究阐明了人工和生物神经系统如何通过不同的分析工具理解其复杂性，提供了一种系统化的方法来促进对智能系统的统一理解。该工作对深度学习的解释性具有重要的潜在影响。

人工神经网络的多层次可解释性：利用神经科学的框架和方法