We present the LM Transparency Tool (LM-TT), an open-source interactive toolkit for analyzing the internal workings of Transformer-based language models. Differently from previously existing tools that focus on isolated parts of the decision-making process, our framework is designed to make the entire prediction process transparent, and allows tracing back model behavior from the top-layer representation to very fine-grained parts of the model. Specifically, it (1) shows the important part of the whole input-to-output information flow, (2) allows attributing any changes done by a model block to individual attention heads and feed-forward neurons, (3) allows interpreting the functions of those heads or neurons. A crucial part of this pipeline is showing the importance of specific model components at each step. As a result, we are able to look at the roles of model components only in cases where they are important for a prediction. Since knowing which components should be inspected is key for analyzing large models where the number of these components is extremely high, we believe our tool will greatly support the interpretability community both in research settings and in practical applications.

我们提出了LM透明工具（LM-TT），这是一种用于分析基于Transformer的语言模型内部机制的开源交互式工具包。与以往专注于决策过程的独立部分的工具不同，我们的框架旨在使整个预测过程透明化，并允许从顶层表示到模型非常细粒度的部分追溯模型行为。我们的工具可以显示输入到输出信息流的重要部分，可以将模型块所做的任何更改归因于个别注意力头和前馈神经元，还可以解释这些头部或神经元的功能。我们相信，我们的工具能够在研究环境和实际应用中极大地支持可解释性研究领域，因为在分析组件众多的大型模型时，了解应检查哪些组件十分关键。

LM透明工具：用于分析Transformer语言模型的交互工具