BriefGPT.xyz
Aug, 2023
鲁棒的拉格朗日与对抗性策略梯度方法用于鲁棒约束马尔可夫决策过程
Robust Lagrangian and Adversarial Policy Gradient for Robust Constrained Markov Decision Processes
HTML
PDF
David M. Bossens
TL;DR
这篇论文介绍了两种算法:基于鲁棒拉格朗日和基于对抗的鲁棒约束策略梯度,通过在鲁棒约束强化学习中引入最坏情况动力学以及逐步学习的方式,这些算法在库存管理和安全导航任务中展现出与传统方法相当甚至更好的性能。
Abstract
The
robust constrained markov decision process
(RCMDP) is a recent task-modelling framework for
reinforcement learning
that incorporates behavioural constraints and that provides robustness to errors in the trans
→