In the era of large language models like ChatGPT, the phenomenon of "model collapse" refers to the situation whereby as a model is trained recursively on data generated from previous generations of itself over time, its performance degrades until the model eventually becomes completely useless, i.e the model collapses. In this work, we study this phenomenon in the simplified setting of kernel regression and obtain results which show a clear crossover between where the model can cope with fake data, and a regime where the model's performance completely collapses. Under polynomial decaying spectral and source conditions, we obtain modified scaling laws which exhibit new crossover phenomena from fast to slow rates. We also propose a simple strategy based on adaptive regularization to mitigate model collapse. Our theoretical results are validated with experiments.

基于 ChatGPT 等大规模语言模型的研究中，模型崩溃现象指的是当模型在逐步进行自身前代生成的数据递归训练时，其性能逐渐降低直至彻底无效化，即模型崩溃。本研究在核回归的简化环境中研究了这一现象，并得到了结果，证明了模型能应对虚假数据的临界点以及性能完全崩溃的情况。在多项式衰减光谱和源条件下，我们得到了展示快速到缓慢变化速率新临界点的修改缩放规律。同时，我们还提出了基于自适应正则化的简单策略来减轻模型崩溃的影响。我们的理论结果经过实验证实。

模型坍塌的揭秘：回归案例