BriefGPT.xyz
Nov, 2021
B-Pref: 基于好恶好评学习的加强学习基准测试
B-Pref: Benchmarking Preference-Based Reinforcement Learning
HTML
PDF
Kimin Lee, Laura Smith, Anca Dragan, Pieter Abbeel
TL;DR
本研究论文提出了一种基于偏好的强化学习基准测试框架 B-Pref,该框架使用了一种新的评估指标,旨在衡量算法的性能和鲁棒性,从而更为系统地研究基于偏好的强化学习算法的设计选择和决策。
Abstract
reinforcement learning
(RL) requires access to a reward function that incentivizes the right behavior, but these are notoriously hard to specify for complex tasks.
preference-based rl
provides an alternative: lea
→