BriefGPT.xyz
Apr, 2025
权重集成改善语言模型的推理能力
Weight Ensembling Improves Reasoning in Language Models
HTML
PDF
Xingyu Dang, Christina Baek, Kaiyue Wen, Zico Kolter, Aditi Raghunathan
TL;DR
本研究解决了推理模型训练过程中的一种失效模式,即生成的多样性开始崩溃,导致测试时扩展性不佳。我们提出了一种简单的干预措施,即将最新的监督微调检查点与早期检查点的权重进行插值(WiSE-FT),几乎完全恢复了 Pass@k,并改善了 Pass@1,显示了在数据更少的情况下更优的结果和测试时扩展性。
Abstract
We investigate a failure mode that arises during the training of
Reasoning Models
, where the diversity of generations begins to collapse, leading to suboptimal
Test-Time Scaling
. Notably, the Pass@1 rate reliably
→