通过自我对弈训练语言模型赢得辩论提升评估准确性

Sep, 2024

Training Language Models to Win Debates with Self-Play Improves Judge Accuracy

Samuel Arnesen, David Rein, Julian Michael

TL;DR本研究旨在解决辩论作为可扩展监督方法的有效性问题，通过自我对弈训练模型进行辩论，使得语言模型评估者在长文本阅读理解任务中能更准确地回答问题。研究发现，与传统的说服性模型相比，基于辩论训练的模型能够生成更强有力和信息丰富的论点，显示出在难以直接评估的任务中提供高质量监督的潜力。

Abstract

We test the robustness of Debate as a method of scalable oversight by training models to Debate with data generated via Self-Play. In a lo