Large language models (LLMs) have shown remarkable performance in various
natural language processing tasks. However, a primary constraint they face is
the context limit, i.e., the maximum number of tokens they can process.
Previous works have explored architectural changes and modifications in
positional encoding to relax the constraint, but they often require expensive
training or do not address the computational demands of self-attention. In this
paper, we present Hierarchical cOntext MERging (HOMER), a new training-free
scheme designed to overcome the limitations. HOMER uses a divide-and-conquer
algorithm, dividing long inputs into manageable chunks. Each chunk is then
processed collectively, employing a hierarchical strategy that merges adjacent
chunks at progressive transformer layers. A token reduction technique precedes
each merging, ensuring memory usage efficiency. We also propose an optimized
computational order reducing the memory requirement to logarithmically scale
with respect to input length, making it especially favorable for environments
with tight memory restrictions. Our experiments demonstrate the proposed
method's superior performance and memory efficiency, enabling the broader use
of LLMs in contexts requiring extended context. Code is available at
this https URL

本文介绍了一种名为 HOMER 的新的无需训练的方案，它使用分而治之的算法将长输入划分为可管理的块，并采用逐层合并的分层策略，以解决大语言模型在上下文限制方面的问题，同时还提出了一种优化的计算顺序，使其对输入长度的内存需求呈对数尺度变化，从而提高了性能和内存效率。