Hosein Mohebbi, Willem Zuidema, Grzegorz Chrupała, Afra Alishahi
TL;DR本文提出了一种针对 Transformer 模型的上下文混合得分方法 Value Zeroing,用于分析模型中各个编码层次信息混合的方式,并通过多种评估方法验证了该方法的优越性。
Abstract
self-attention weights and their transformed variants have been the main
source of information for analyzing token-to-token interactions in
transformer-based models. But despite their ease of interpretation, thes