BriefGPT.xyz
Apr, 2024
PIPER: 基于先验知识驱动的基于偏见重标记的层次强化学习
PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling
HTML
PDF
Utsav Singh, Wesley A. Suttle, Brian M. Sadler, Vinay P. Namboodiri, Amrit Singh Bedi
TL;DR
通过基于偏好的学习来学习奖励模型,并利用此模型对更高级别的重放缓冲区进行重新标记,以缓解现有层次化方法中常见的非稳态问题,并在各种具有挑战性的稀疏奖励任务中展现出令人印象深刻的性能。
Abstract
In this work, we introduce
piper
:
primitive-informed preference-based hierarchical reinforcement learning
via Hindsight
relabeling
, a nove
→