Enabling LLMs to handle lengthy context is currently a research hotspot. Most LLMs are built upon rotary position embedding (RoPE), a popular position encoding method. Therefore, a prominent path is to extrapolate the RoPE trained on comparably short texts to far longer texts. A heavy bunch of efforts have been dedicated to boosting the extrapolation via extending the formulations of the RoPE, however, few of them have attempted to showcase their inner workings comprehensively. In this paper, we are driven to offer a straightforward yet in-depth understanding of RoPE extensions from an attention perspective and on two benchmarking tasks. A broad array of experiments reveals several valuable findings: 1) Maintaining attention patterns to those at the pretrained length improves extrapolation; 2) Large attention uncertainty leads to retrieval errors; 3) Using longer continual pretraining lengths for RoPE extensions could reduce attention uncertainty and significantly enhance extrapolation.

以关注LLMs为研究热点，本文从注意力角度对RoPE拓展进行了详细研究，通过实验证明：1）保持与预训练长度一致的注意力模式可提高拓展性能；2）较大的注意力不确定性导致检索错误；3）在RoPE拓展中使用更长的预训练长度可以降低注意力不确定性并显著提升拓展性能。

理解长上下文LLMs的RoPE扩展：一个注意力视角