We extend the context length of Llama-3-8B-Instruct from 8K to 80K via QLoRA
fine-tuning. The entire training cycle is super efficient, which takes 8 hours
on one 8xA800 (80G) GPU machine. The resulted model exhibits superior
performances across a broad range of evaluation tasks, such as NIHS, topic
retrieval, and long-context language understanding; meanwhile, it also well
preserves the original capability over short contexts. The dramatic context
extension is mainly attributed to merely 3.5K synthetic training samples
generated by GPT-4 , which indicates the LLMs' inherent (yet largely
underestimated) potential to extend its original context length. In fact, the
context length could be extended far beyond 80K with more computation
resources. Therefore, the team will publicly release the entire resources
(including data, model, data generation pipeline, training code) so as to
facilitate the future research from the community:
https://github.com/FlagOpen/FlagEmbedding.

通过 QLoRA 的微调，我们将 LLama-3-8B-Instruct 的上下文长度从 8K 扩展到 80K。整个训练周期非常高效，在一台 8xA800（80G）GPU 机器上仅需 8 小时。生成的模型在广泛的评估任务中表现出优越性能，如 NIHS、主题检索和长上下文语言理解；同时，它还很好地保留了短上下文的原始能力。这种显著的上下文扩展主要归功于由 GPT-4 生成的仅 3.5K 个合成训练样本，这表明 LLMs 具有潜在的（尽管在很大程度上被低估的）扩展原始上下文长度的能力。事实上，通过提供更多计算资源，上下文长度可以进一步扩展到 80K 之外。因此，团队将公开发布所有资源（包括数据、模型、数据生成流水线、训练代码），以便促进来自社区的未来研究：https://github.com/FlagOpen/FlagEmbedding。