BriefGPT.xyz
Mar, 2020
使用连续动态模型为Transformer学习位置编码
Learning to Encode Position for Transformer with Continuous Dynamical Model
HTML
PDF
Xuanqing Liu, Hsiang-Fu Yu, Inderjit Dhillon, Cho-Jui Hsieh
TL;DR
提出了一种新的位置信息编码方法,使用神经常微分方法对非循环模型(如Transformer)进行编码,并证明在翻译和理解任务中,该编码方法与已有编码方法相比具有更好的性能。
Abstract
We introduce a new way of learning to encode position information for
non-recurrent models
, such as Transformer models. Unlike RNN and LSTM, which contain inductive bias by loading the input tokens sequentially,
non-rec
→