Recent advancements in large language models (LLMs) have significantly expanded their functionality and skills as tool agents. In this paper, we argue that a waveform pattern in the model's attention allocation has an impact on the tool use performance, which degrades when the position of essential information hits the trough zone. To address this issue, we propose a novel inference method named Attention Buckets. This approach enables LLMs to handle context by conducting parallel processes, each featuring a unique RoPE angle base that shapes the attention waveform. Attention Buckets ensures that an attention trough of a particular process can be compensated with an attention peak of another run, reducing the risk of the LLM missing essential information residing within the attention trough. Our extensive experiments on the widely recognized tool use benchmark demonstrate the efficacy of our approach, where a 7B-parameter open-source model enhanced by Attention Buckets achieves SOTA performance on par with GPT-4.

提出了一种名为Attention Buckets的新推理方法，通过并行处理每个过程来处理上下文，每个过程都具有独特的RoPE角度基准，塑造了注意力波形，从而保证了模型不会错过注意力凹槽内的重要信息，从而增强了LLMs的性能。

加固关注中的最短支点：增强大型语言模型的上下文感知能力以实现有效的工具使用