BriefGPT.xyz
Oct, 2024
Flow-DPO:通过在线多代理学习提升大型语言模型的数学推理能力
Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning
HTML
PDF
Yihe Deng, Paul Mineiro
TL;DR
本研究针对大型语言模型(LLMs)在生成详细准确的推理过程方面面临的挑战,提出了一种新颖的方法,即通过在线学习“Flows”生成高质量的推理过程以进行模型微调。通过使用在线直接偏好优化(DPO)学习,本方法展示了在数学推理任务中显著提高模型性能的潜力。
Abstract
Mathematical Reasoning
is a crucial capability for
Large Language Models
(LLMs), yet generating detailed and accurate reasoning traces remains a significant challenge. This paper introduces a novel approach to pr
→