Efficient attentions have greatly improved the computational efficiency of
Transformers. However, most existing linear attention mechanisms suffer from an
\emph{efficiency degradation} problem, leading to inefficiencies in causal
language modeling and hindering their application in long-range language
models. This problem is more pronounced under language modeling with unbounded
contexts. In this paper, we propose \textbf{L}inear \textbf{A}ttention
\textbf{V}ia \textbf{O}rthogonal memory~(\shortname) to address these
limitations, achieving strong performance while maintaining linear complexity.
\shortname employs orthogonal decomposition to compress a context into a
fixed-size orthogonal memory while effectively minimizing redundancy within the
context. Given that orthogonal memory compresses global information, we further
dissect the context to amplify fine-grained local information. Additionally, we
embed the relative position encoding into \shortname to improve the
extrapolation ability. Experimental results show that \shortname greatly
improves the efficiency of the causal language model with the best
extrapolation performance and outperforms other efficient baselines. Further,
we endeavor to employ \shortname for unbounded language modeling and
successfully scale the context length to 128K.

通过使用正交内存（LAVO），我们提出了线性注意力方法的一种改进，通过正交分解将上下文压缩为固定大小的正交内存，同时最小化上下文中的冗余，并通过嵌入相对位置编码来改善外推能力。实验证明，LAVO 极大地提高了因果语言模型的效率，并在最佳外推性能上优于其他高效方法。

通过正交内存实现线性关注

Linear Attention via Orthogonal Memory

Long document summarization systems are critical for domains with lengthy and
jargonladen text, yet they present significant challenges to researchers and
developers with limited computing resources. Existing solutions mainly focus on
efficient attentions or divide-and-conquer strategies. The former reduces
theoretical time complexity, but is still memory-heavy. The latter methods
sacrifice global context, leading to uninformative and incoherent summaries.
This work aims to leverage the memory-efficient nature of divide-and-conquer
methods while preserving global context. Concretely, our framework AWESOME uses
two novel mechanisms: (1) External memory mechanisms track previously encoded
document segments and their corresponding summaries, to enhance global document
understanding and summary coherence. (2) Global salient content is further
identified beforehand to augment each document segment to support its
summarization. Extensive experiments on diverse genres of text, including
government reports, transcripts, scientific papers, and novels, show that
AWESOME produces summaries with improved informativeness, faithfulness, and
coherence than competitive baselines on longer documents, while having a
similar or smaller GPU memory footprint.

本文提出了一种基于分治策略和外部内存机制的长文档自动摘要框架 AWESOME，该框架通过预处理全局重要内容，在保留全局上下文的同时，增强了对全文的理解，实现了更好的摘要信息性、可信度和连贯性。