BriefGPT.xyz
Jan, 2022
非平稳目标和约束的可证明高效原始-对偶强化学习在CMDPs中的应用
Provably Efficient Primal-Dual Reinforcement Learning for CMDPs with Non-stationary Objectives and Constraints
HTML
PDF
Yuhao Ding, Javad Lavaei
TL;DR
本文研究了具有不稳定目标和约束的约束马尔可夫决策过程的原始-对偶强化学习,并提出了具有安全性和适应性的时间变化中安全的RL算法,同时建立了动态遗憾界和约束违规界。
Abstract
We consider primal-dual-based
reinforcement learning
(RL) in episodic
constrained markov decision processes
(CMDPs) with non-stationary objectives and constraints, which play a central role in ensuring the safety
→