The advent of Large Language Models (LLMs) represents a notable breakthrough
in Natural Language Processing (NLP), contributing to substantial progress in
both text comprehension and generation. However, amidst these advancements, it
is noteworthy that LLMs often face a limitation in terms of context length
extrapolation. Understanding and extending the context length for LLMs is
crucial in enhancing their performance across various NLP applications. In this
survey paper, we delve into the multifaceted aspects of exploring why it is
essential, and the potential transformations that superior techniques could
bring to NLP applications. We study the inherent challenges associated with
extending context length and present an organized overview of the existing
strategies employed by researchers. Additionally, we discuss the intricacies of
evaluating context extension techniques and highlight the open challenges that
researchers face in this domain. Furthermore, we explore whether there is a
consensus within the research community regarding evaluation standards and
identify areas where further agreement is needed. This comprehensive survey
aims to serve as a valuable resource for researchers, guiding them through the
nuances of context length extension techniques and fostering discussions on
future advancements in this evolving field.

大语言模型的出现在自然语言处理领域具有重大突破，但是它们在上下文长度的推断方面常常存在限制。了解和扩展大语言模型的上下文长度对于提升其在各种自然语言处理应用中的性能至关重要。本综述论文将深入探讨为什么扩展上下文长度以及先进技术可能带来的潜在变革。我们研究了扩展上下文长度所固有的挑战，并对研究人员采用的现有策略进行了有组织的概述。此外，我们讨论了评估上下文扩展技术的复杂性，并突出研究人员在该领域面临的开放性挑战。此外，我们探讨了研究界对于评估标准是否存在共识，并确定了需要进一步达成共识的领域。这份全面的综述旨在为研究人员提供有价值的资源，引导他们了解上下文长度扩展技术的技巧，并促进对这一不断发展领域的未来进展的讨论。

大语言模型中上下文长度扩展技术的什么、为什么和如何 - 详细调查

The What, Why, and How of Context Length Extension Techniques in Large  Language Models -- A Detailed Survey

Modern large language models (LLMs) that rely on attention mechanisms are
typically trained with fixed context lengths which enforce upper limits on the
length of input sequences that they can handle at evaluation time. To use these
models on sequences longer than the train-time context length, one might employ
techniques from the growing family of context length extrapolation methods --
most of which focus on modifying the system of positional encodings used in the
attention mechanism to indicate where tokens or activations are located in the
input sequence. We conduct a wide survey of existing methods of context length
extrapolation on a base LLaMA or LLaMA 2 model, and introduce some of our own
design as well -- in particular, a new truncation strategy for modifying the
basis for the position encoding.
We test these methods using three new evaluation tasks (FreeFormQA,
AlteredNumericQA, and LongChat-Lines) as well as perplexity, which we find to
be less fine-grained as a measure of long context performance of LLMs. We
release the three tasks publicly as datasets on HuggingFace. We discover that
linear scaling is the best method for extending context length, and show that
further gains can be achieved by using longer scales at evaluation time. We
also discover promising extrapolation capabilities in the truncated basis. To
support further research in this area, we release three new 13B parameter
long-context models which we call Giraffe: 4k and 16k context models trained
from base LLaMA-13B, and a 32k context model trained from base LLaMA2-13B. We
also release the code to replicate our results.

现代大型语言模型（LLMs）通常使用固定的上下文长度进行训练，但这限制了它们在评估时能处理的输入序列的长度。为了在训练时间上下文长度之外的较长序列上使用这些模型，可以采用不断增长的上下文长度外推方法。本文对现有的上下文长度外推方法进行了广泛调研，并介绍了一些新的设计，特别是一种用于修改位置编码基础的截断策略。我们使用三个新的评估任务（FreeFormQA，AlteredNumericQA 和 LongChat-Lines）以及困惑度进行了测试，并将这些任务作为公共数据集发布在 HuggingFace 上。我们发现线性标度是扩展上下文长度的最佳方法，并且显示在评估时使用更长的标度可以获得进一步的收益。我们还发现在截断基础中存在有希望的推测能力。为了支持进一步的研究，我们发布了三个新的 13B 参数长上下文模型，名为 Giraffe：从基础 LLaMA-13B 训练的 4k 和 16k 上下文模型，以及从基础 LLaMA2-13B 训练的 32k 上下文模型。我们还发布了复制我们结果的代码。