BriefGPT.xyz
Nov, 2023
处理成本和约束的离策略深度强化学习
Handling Cost and Constraints with Off-Policy Deep Reinforcement Learning
HTML
PDF
Jared Markowitz, Jesse Silverberg, Gary Collins
TL;DR
混合符号奖励环境中,重新考虑原有策略更新方法的安全性,通过解决数值估计误差的问题和不显式地最大化Q值的方法,提出了新的离策略演员-评论家方法,以提高深度强化学习算法在连续动作空间中的学习效果。
Abstract
By reusing data throughout training,
off-policy deep reinforcement learning
algorithms offer improved
sample efficiency
relative to on-policy approaches. For
→