考虑位置注意力用于长序列的外推

Nov, 2019

考虑位置注意力用于长序列的外推

Location Attention for Extrapolation to Longer Sequences

Yann Dubois, Gautier Dagan, Dieuwke Hupkes, Elia Bruni

TL;DR本文讨论神经网络的外推问题，提出针对自然语言处理中对比训练集更长序列的泛化能力的注意力机制，并在Lookup Table任务的变体上验证了此假设，证明此种模型能更好地处理序列问题。

Abstract

neural networks are surprisingly good at interpolating and perform remarkably well when the training set examples resemble those in the test set. However, they are often unable to extrapolate patterns beyond the seen data, even when the abstractions required for such patterns are simpl