BriefGPT.xyz
Jun, 2024
CItruS: 分块指令感知的长序列建模状态驱逐
CItruS: Chunked Instruction-aware State Eviction for Long Sequence Modeling
HTML
PDF
Yu Bai, Xiyuan Zou, Heyan Huang, Sanxing Chen, Marc-Antoine Rondeau...
TL;DR
通过引入一种新的建模技术CItruS,在隐藏状态的清理过程中集成下游任务中有用的注意力偏好,以解决信息忽略的问题,同时设计了一种分块序列处理方法来提高效率,该方法在相同的内存预算下在长序列理解和检索任务上表现出优越性能。
Abstract
Long sequence modeling has gained broad interest as
large language models
(LLMs) continue to advance. Recent research has identified that a large portion of
hidden states
within the key-value caches of
→