BriefGPT.xyz
Jun, 2024
离线强化学习的偏好调查
Preference Elicitation for Offline Reinforcement Learning
HTML
PDF
Alizée Pace, Bernhard Schölkopf, Gunnar Rätsch, Giorgia Ramponi
TL;DR
利用学习到的环境模型,在完全离线的环境中提出了一种离线基于偏好的强化学习算法Sim-OPRL,通过模拟轨迹获取偏好反馈,对于超出分布的数据采用悲观方法,对于获取最优策略相关的信息采用乐观方法,提供了关于样本复杂度的理论保证,最后通过在不同环境中的实验结果展示了Sim-OPRL的经验性能。
Abstract
Applying
reinforcement learning
(RL) to real-world problems is often made challenging by the inability to interact with the environment and the difficulty of designing reward functions.
offline rl
addresses the f
→