BriefGPT.xyz
Jun, 2024
DIPPER:直接优化偏好以加速基元级层次强化学习
DIPPER: Direct Preference Optimization to Accelerate Primitive-Enabled Hierarchical Reinforcement Learning
HTML
PDF
Utsav Singh, Souradip Chakraborty, Wesley A. Suttle, Brian M. Sadler, Vinay P Namboodiri...
TL;DR
DIPPER是一种高效的分层方法,结合直接优化和强化学习,在从人类偏好数据中学习更高级策略和更低级策略的基础上,解决了从人类偏好数据学习复杂机器人任务的挑战。
Abstract
learning control policies
to perform
complex robotics tasks
from
human preference data
presents significant challenges. On the one hand, t
→