BriefGPT.xyz
May, 2022
重复数据学习的可解释性和缩放定律
Scaling Laws and Interpretability of Learning from Repeated Data
HTML
PDF
Danny Hernandez, Tom Brown, Tom Conerly, Nova DasSarma, Dawn Drain...
TL;DR
本文研究大型语言模型中反复数据对性能的影响机理,并发现了一个强烈的双下降现象,即重复数据可能导致测试损失在训练中途上升。实验表明,反复数据对模型的性能造成了明显的损害,可能会导致模型从泛化到记忆出现转移。
Abstract
Recent
large language models
have been trained on vast datasets, but also often on
repeated data
, either intentionally for the purpose of upweighting higher quality data, or unintentionally because data deduplica
→