Typically, training LLMs with long context sizes is computationally
expensive, requiring extensive training hours and GPU resources. Existing
long-context extension methods usually need additional training procedures to
support corresponding long-context windows, where the long-context training
data (e.g., 32k) is needed, and high GPU training costs are assumed. To address
the aforementioned issues, we propose an Efficient and Extreme length extension
method for Large Language Models, called E 2 -LLM, with only one training
procedure and dramatically reduced computation cost, which also removes the
need to collect long-context data. Concretely, first, the training data of our
E 2 -LLM only requires a short length (e.g., 4k), which reduces the tuning cost
greatly. Second, the training procedure on the short training context window is
performed only once time, and we can support different evaluation context
windows at inference. Third, in E 2 - LLM, based on RoPE position embeddings,
we introduce two different augmentation methods on the scale and position index
parameters for different samples in training. It aims to make the model more
robust to the different relative differences when directly interpolating the
arbitrary context length at inference. Comprehensive experimental results on
multiple benchmark datasets demonstrate the effectiveness of our E 2 -LLM on
challenging long-context tasks.

我们提出了一种名为 E2-LLM 的高效和极长扩展的大型语言模型方法，通过减少计算成本并对不同样本进行增强方法来在推理时支持任意上下文长度，实验结果表明其在具有挑战性的长上下文任务中的有效性。

E^2-LLM：大型语言模型的高效和极端长度扩展

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

Training large language models to follow instructions makes them perform
better on a wide range of tasks, generally becoming more helpful. However, a
perfectly helpful model will follow even the most malicious instructions and
readily generate harmful content. In this paper, we raise concerns over the
safety of models that only emphasize helpfulness, not safety, in their
instruction-tuning. We show that several popular instruction-tuned models are
highly unsafe. Moreover, we show that adding just 3% safety examples (a few
hundred demonstrations) in the training set when fine-tuning a model like LLaMA
can substantially improve their safety. Our safety-tuning does not make models
significantly less capable or helpful as measured by standard benchmarks.
However, we do find a behavior of exaggerated safety, where too much
safety-tuning makes models refuse to respond to reasonable prompts that
superficially resemble unsafe ones. Our study sheds light on trade-offs in
training LLMs to follow instructions and exhibit safe behavior.

训练大型语言模型遵循指示能够使其在各种任务上表现更好，但完全符合的模型会遵循即使是最恶意的指示并且容易生成有害内容。本文提出了对强调帮助而不是安全性的模型安全性的担忧。我们展示了一些流行的经过指示调优的模型高度不安全。此外，我们证明了在训练 LLaMA 等模型进行微调时，仅增加 3％的安全示例（几百个演示）可以显着提高其安全性。我们的安全性调优并不会使模型在标准基准测试中明显变得不够能力强或有所帮助。然而，我们发现一种夸大的安全性行为，即过度的安全调优使模型拒绝对表面上类似不安全的合理提示作出回应。我们的研究揭示了训练 LLM 遵循指示并展示安全行为的权衡。