BriefGPT.xyz
Oct, 2022
通过可证明遗憾界实现分布式和风险敏感的强化学习
Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds
HTML
PDF
Hao Liang, Zhi-Quan Luo
TL;DR
研究了通过分布式强化学习方法实现风险敏感强化学习的后悔保证,提出了两种新的DRL算法,并通过样本复杂度桥接了DRL和RSRL。同时还改进了现有的下限,并提出了更紧的下限。
Abstract
We study the regret guarantee for
risk-sensitive reinforcement learning
(RSRL) via
distributional reinforcement learning
(DRL) methods. In particular, we consider finite episodic Markov decision processes whose o
→