BriefGPT.xyz
Feb, 2025
用于长时程交互式大型语言模型代理的强化学习
Reinforcement Learning for Long-Horizon Interactive LLM Agents
HTML
PDF
Kevin Chen, Marco Cusumano-Towner, Brody Huval, Aleksei Petrenko, Jackson Hamburger...
TL;DR
本文针对交互式数字代理(IDA)在复杂基准测试中表现不佳的问题,提出了一种新的强化学习方法,通过在目标环境中直接训练IDA,以增强其任务执行能力。研究发现,采用这一方法的32亿参数代理在AppWorld环境中的表现超越了现有较大模型,揭示出强化学习在多领域交互应用场景中的潜在价值。
Abstract
Interactive Digital Agents
(IDAs) leverage APIs of stateful digital environments to perform tasks in response to user requests. While IDAs powered by instruction-tuned
Large Language Models
(LLMs) can react to fe
→