BriefGPT.xyz
May, 2024
通过对齐的经验估计实现高效的基于偏好的强化学习
Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation
HTML
PDF
Fengshuo Bai, Rui Zhao, Hongming Zhang, Sijia Cui, Ying Wen...
TL;DR
PbRL方法SEER通过整合标签平滑和策略规则化技术,提高了反馈效率,取得了显著的性能优势。
Abstract
preference-based reinforcement learning
(
pbrl
) has shown impressive capabilities in training agents without reward engineering. However, a notable limitation of
→