Sangwon Yu, Jongyoon Song, Heeseung Kim, Seong-min Lee, Woo-Jong Ryu...
TL;DR本研究分析了Neural Language Model中Token Embeddings 的训练动态,探讨了少见Token Embeddings 梯度的特定部分是引起表示退化问题的关键原因,并基于此提出一种名为自适应梯度门控(AGG)的新方法来解决此问题,实验证明了AGG的有效性。
Abstract
Despite advances in neural network language model, the representation degeneration problem of embeddings is still challenging. Recent studies have found that the learned output embeddings are degenerated into a narrow-cone distribution which makes the similarity between each embeddings