BriefGPT.xyz
Feb, 2021
针对学习对抗线性混合MDP的接近最优策略优化算法
Nearly Optimal Regret for Learning Adversarial MDPs with Linear Function Approximation
HTML
PDF
Jiafan He, Dongruo Zhou, Quanquan Gu
TL;DR
本文研究含对手的强化学习中马尔科夫决策过程的学习问题,并提出了一种乐观的策略优化算法POWERS,该算法可以达到近似最小化的最优遗憾,并证明了该算法的上下界。
Abstract
We study the
reinforcement learning
for finite-horizon episodic
markov decision processes
with
adversarial reward
and full information fee
→