Extending large language models to effectively handle long contexts requires instruction fine-tuning on input sequences of similar length. To address this, we present LongAlign -- a recipe of the instruction data, training, and evaluation for long context alignment. First, we construct a long instruction-following dataset using Self-Instruct. To ensure the data diversity, it covers a broad range of tasks from various long context sources. Second, we adopt the packing and sorted batching strategies to speed up supervised fine-tuning on data with varied length distributions. Additionally, we develop a loss weighting method to balance the contribution to the loss across different sequences during packing training. Third, we introduce the LongBench-Chat benchmark for evaluating instruction-following capabilities on queries of 10k-100k in length. Experiments show that LongAlign outperforms existing recipes for LLMs in long context tasks by up to 30\%, while also maintaining their proficiency in handling short, generic tasks. The code, data, and long-aligned models are open-sourced at https://github.com/THUDM/LongAlign.

扩展大型语言模型以有效处理长篇背景需要依据相似长度的输入序列进行指导微调，本文提出了LongAlign框架，包括长篇背景对齐的指导数据、训练和评估方法，通过Self-Instruct构建了包含各种长篇背景任务的数据集，采用打包和排序批处理策略加快有差异长度分布的数据的监督微调，引入了损失权重方法以平衡打包训练过程中不同序列对损失的贡献，并引入了LongBench-Chat测试基准来评估对1万至10万字查询的指导跟进能力，实验证明LongAlign在长篇背景任务中性能比现有的大型语言模型框架提升了30％，同时保持了对短语、通用任务的熟练处理能力。

LongAlign: 大型语言模型的长文本对齐配方