BriefGPT.xyz
Jun, 2023
离线有限制强化学习的原始-对偶-评论家算法
A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning
HTML
PDF
Kihyuk Hong, Yuhang Li, Ambuj Tewari
TL;DR
本文提出了一种基于Primal-Dual-Critic算法的离线约束强化学习模型,该算法不需要像以往模型那样需要浓度和强Bellman完备性等假设条件,仅需要集中性和价值函数/边际重要性加权实现等假设条件,并且在广泛的函数逼近验证中获得了良好的表现效果。
Abstract
offline constrained reinforcement learning
(RL) aims to learn a policy that maximizes the expected cumulative reward subject to constraints on expected value of cost functions using an existing dataset. In this paper, we propose
→