BriefGPT.xyz
Feb, 2024
RL-VLM-F: 视觉语言基础模型反馈的强化学习
RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback
HTML
PDF
Yufei Wang, Zhanyi Sun, Jesse Zhang, Zhou Xian, Erdem Biyik...
TL;DR
提出了一种自动生成奖励函数的方法RL-VLM-F,通过利用视觉语言基础模型的反馈,从任务目标的文本描述和代理人的视觉观察中自动生成奖励函数,避免了人力成本和试错过程,在各个领域中成功产生了有效的奖励和策略,并优于使用大规模预训练模型的先前方法。
Abstract
reward engineering
has long been a challenge in
reinforcement learning
(RL) research, as it often requires extensive human effort and iterative processes of trial-and-error to design effective reward functions. I
→