Continual learning (CL) is crucial for language models to dynamically adapt to the evolving real-world demands. To mitigate the catastrophic forgetting problem in CL, data replay has been proven a simple and effective strategy, and the subsequent data-replay-based distillation can further enhance the performance. However, existing methods fail to fully exploit the knowledge embedded in models from previous tasks, resulting in the need for a relatively large number of replay samples to achieve good results. In this work, we first explore and emphasize the importance of attention weights in knowledge retention, and then propose a SElective attEntion-guided Knowledge Retention method (SEEKR) for data-efficient replay-based continual learning of large language models (LLMs). Specifically, SEEKR performs attention distillation on the selected attention heads for finer-grained knowledge retention, where the proposed forgettability-based and task-sensitivity-based measures are used to identify the most valuable attention heads. Experimental results on two continual learning benchmarks for LLMs demonstrate the superiority of SEEKR over the existing methods on both performance and efficiency. Explicitly, SEEKR achieves comparable or even better performance with only 1/10 of the replayed data used by other methods, and reduces the proportion of replayed data to 1%.

该研究旨在解决持续学习中大语言模型的知识遗忘问题，现有方法未能充分利用之前任务中的知识。文章提出了一种选择性注意力引导的知识保留方法SEEKR，通过关注选定的注意力头实现数据高效重放式持续学习，实验结果表明SEEKR在性能和效率上优于现有方法，且在重放数据量大幅减少的情况下仍能保持相当甚至更好的性能。

SEEKR：基于选择性注意力的知识保留方法用于大语言模型的持续学习