Transformers do not scale very well to long sequence lengths largely because
of quadratic self-attention complexity. In the recent months, a wide spectrum
of efficient, fast Transformers have been proposed to tackle this problem, more
often than not claiming superior or comparable model quality to vanilla
Transformer models. To this date, there is no well-established consensus on how
to evaluate this class of models. Moreover, inconsistent benchmarking on a wide
spectrum of tasks and datasets makes it difficult to assess relative model
quality amongst many models. This paper proposes a systematic and unified
benchmark, LRA, specifically focused on evaluating model quality under
long-context scenarios. Our benchmark is a suite of tasks consisting of
sequences ranging from $1K$ to $16K$ tokens, encompassing a wide range of data
types and modalities such as text, natural, synthetic images, and mathematical
expressions requiring similarity, structural, and visual-spatial reasoning. We
systematically evaluate ten well-established long-range Transformer models
(Reformers, Linformers, Linear Transformers, Sinkhorn Transformers, Performers,
Synthesizers, Sparse Transformers, and Longformers) on our newly proposed
benchmark suite. LRA paves the way towards better understanding this class of
efficient Transformer models, facilitates more research in this direction, and
presents new challenging tasks to tackle. Our benchmark code will be released
at this https URL

本论文提出了一个系统和统一的基准测试，名为 LRA，专门评估长上下文下的模型质量，其中系统地评估了十个成熟的长范围 Transformer 模型，为更好地理解高效 Transformer 模型铺平了道路，并提出了新的具有挑战性的任务。