BriefGPT.xyz
Jun, 2023
通过数据集约束的政策正则化用于离线强化学习
Policy Regularization with Dataset Constraint for Offline Reinforcement Learning
HTML
PDF
Yuhang Ran, Yi-Chen Li, Fuxiang Zhang, Zongzhang Zhang, Yang Yu
TL;DR
本文提出了一种名为PRDC的政策正则化算法,该算法通过数据集约束从离线强化学习的数据集中学习最佳政策,能够缓解价值高估问题并在一组机器人应用上实现了最先进的性能.
Abstract
We consider the problem of learning the best possible policy from a fixed dataset, known as
offline reinforcement learning
(RL). A common taxonomy of existing offline RL works is
policy regularization
, which typi
→