BriefGPT.xyz
Jan, 2024
自注意力中的各向异性
Anisotropy Is Inherent to Self-Attention in Transformers
HTML
PDF
Nathan Godey, Éric de la Clergerie, Benoît Sagot
TL;DR
通过实证观察,本文展示了基于Transformer的语言模型和其他模态下的Transformer存在的角度距离接近的问题,即各向异性问题。
Abstract
The
representation degeneration
problem is a phenomenon that is widely observed among
self-supervised learning
methods based on
transformers
→