BriefGPT.xyz
Apr, 2025
大语言模型是贪婪代理:RL微调对决策能力的影响
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities
HTML
PDF
Thomas Schmied, Jörg Bornschein, Jordi Grau-Moya, Markus Wulfmeier, Razvan Pascanu
TL;DR
本研究解决了大语言模型(LLMs)在决策场景中表现不佳的问题,具体分析了其贪婪性、频率偏差和知行差距等失效模式。通过利用自生成的思维链(CoT)理性进行强化学习(RL)微调,实验结果表明,该方法显著增强了LLMs的决策能力,提升了探索性并缩小了知行差距。
Abstract
The success of
Large Language Models
(LLMs) has sparked interest in various agentic applications. A key hypothesis is that LLMs, leveraging common sense and
Chain-of-Thought
(CoT) reasoning, can effectively explo
→