While deep neural networks have demonstrated groundbreaking performance in various settings, these models often suffer from \emph{catastrophic forgetting} when trained on new tasks in sequence. Several works have empirically demonstrated that increasing the width of a neural network leads to a decrease in catastrophic forgetting but have yet to characterize the exact relationship between width and continual learning. We design one of the first frameworks to analyze Continual Learning Theory and prove that width is directly related to forgetting in Feed-Forward Networks (FFN). Specifically, we demonstrate that increasing network widths to reduce forgetting yields diminishing returns. We empirically verify our claims at widths hitherto unexplored in prior studies where the diminishing returns are clearly observed as predicted by our theory.

深度神经网络在各种环境中表现出前沿的性能，但在按顺序训练新任务时往往会出现“灾难性遗忘”。本研究设计了一个框架来分析连续学习理论，并证明网络宽度与遗忘之间存在直接关系。具体而言，我们证明增加网络宽度以减少遗忘产生递减的效果，我们在以前的研究中未曾探索过的宽度范围上通过实验证实了我们理论的预测，清晰地观察到这种递减效果。

关于连续学习中宽度递减收益的研究