Catastrophic forgetting (CF) is a phenomenon that occurs in machine learning when a model forgets previously learned information as it learns new information. As large language models (LLMs) have shown excellent performance, it is interesting to uncover whether CF exists in the continual fine-tuning of LLMs. In this study, we empirically evaluate the forgetting phenomenon in LLMs' knowledge, from the perspectives of domain knowledge, reasoning, and reading comprehension. The experiments demonstrate that catastrophic forgetting is generally observed in LLMs ranging from 1b to 7b. Furthermore, as the scale increases, the severity of forgetting also intensifies. Comparing the decoder-only model BLOOMZ with the encoder-decoder model mT0, BLOOMZ suffers less forgetting and maintains more knowledge. We also observe that LLMs can mitigate language bias (e.g. gender bias) during continual fine-tuning. Moreover, we find that ALPACA can maintain more knowledge and capacity compared with LLAMA during the continual fine-tuning, which implies that general instruction tuning can help mitigate the forgetting phenomenon of LLMs in the further fine-tuning process.

大型语言模型在不断微调的过程中存在灾难性遗忘现象，尤其随着规模的增加，遗忘的严重程度也加剧，然而通过单独解码器模型BLOOMZ与编码器-解码器模型mT0的比较，发现BLOOMZ遗忘较少且保留更多知识，还观察到语言模型能够在不断微调中缓解语言偏见，同时通用指令微调有助于减轻大型语言模型在进一步微调过程中的遗忘现象。

大型语言模型在连续微调中的灾难性遗忘的实证研究