BriefGPT.xyz
Feb, 2021
改进的环境依赖强化学习的鲁棒性算法
Improved Corruption Robust Algorithms for Episodic Reinforcement Learning
HTML
PDF
Yifang Chen, Simon S. Du, Kevin Jamieson
TL;DR
研究了在奖励和转移概率未知的情况下的分集式强化学习,提出了使用新的算法来达到更好的后悔界限,并基于攻击性鲁棒性策略消除元算法和插入式无奖励探索子算法的通用算法框架。
Abstract
We study
episodic reinforcement learning
under unknown
adversarial corruptions
in both the rewards and the transition probabilities of the underlying system. We propose new algorithms which, compared to the exist
→