BriefGPT.xyz
Jun, 2024
从容不迫:大型语言模型上上下文窗口扩展的高效配方,持续关注“中间”增强
Never Miss A Beat: An Efficient Recipe for Context Window Extension of Large Language Models with Consistent "Middle" Enhancement
HTML
PDF
Tong Wu, Yanpeng Zhao, Zilong Zheng
TL;DR
提出了一种称为CREAM的方法,通过操纵位置索引插值位置编码,从而实现对预训练大型语言模型的上下文长度进行扩展,并解决长上下文模型面临的“丢失在中间”问题。
Abstract
Recently, many methods have been developed to extend the
context length
of
pre-trained large language models
(LLMs), but they often require
fine-
→