BriefGPT.xyz
Dec, 2022
感受野对齐实现Transformer长度外推
Receptive Field Alignment Enables Transformer Length Extrapolation
HTML
PDF
Ta-Chung Chi, Ting-Han Fan, Alexander I. Rudnicky
TL;DR
研究了相对位置嵌入在语言模型上的应用,提出了基于对齐假设的自注意力机制,在训练过程中对齐输入,在测试过程中保证了相对位置嵌入的性质。提出的Sandwich positional embedding将比训练序列更长的信息融入模型之中,且由于隐式窗口化的自注意力机制,其可实现高效的推断。
Abstract
Length extrapolation is a desirable property that permits training a
transformer language model
on short sequences and retaining similar
perplexities
when the model is tested on substantially longer sequences. A
→