BriefGPT.xyz
Apr, 2022
COptiDICE: 离线约束强化学习基于稳态分布修正估计
COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation
HTML
PDF
Jongmin Lee, Cosmin Paduraru, Daniel J. Mankowitz, Nicolas Heess, Doina Precup...
TL;DR
本文提出了一种基于COptiDICE的离线约束强化学习算法,该算法直接估计稳态分布的矫正值以优化策略,以满足成本约束,并在实验中表现出更好的约束满足和回报最大化的策略.
Abstract
We consider the offline
constrained reinforcement learning
(RL) problem, in which the agent aims to compute a policy that maximizes expected return while satisfying given
cost constraints
, learning only from a pr
→