BriefGPT.xyz
Aug, 2020
通过反向价值函数的约束马尔可夫决策过程
Constrained Markov Decision Processes via Backward Value Functions
HTML
PDF
Harsh Satija, Philip Amortila, Joelle Pineau
TL;DR
本文提出了一种新的强化学习算法来应对现实世界中存在的约束条件问题,该算法将成本累加约束转化为基于状态的约束,并确保代理在训练过程中满足这些约束,同时保证其最大化回报。实验证明这种基于深度神经网络的算法在安全导航任务和约束版MuJoCo环境中表现出色。
Abstract
Although
reinforcement learning
(RL) algorithms have found tremendous success in simulated domains, they often cannot directly be applied to physical systems, especially in cases where there are hard
constraints
→