BriefGPT.xyz
Oct, 2020
非平稳RL中的无模型方法:接近最优遗憾及在多智能体RL和库存控制中的应用
Near-Optimal Regret Bounds for Model-Free RL in Non-Stationary Episodic MDPs
HTML
PDF
Weichao Mao, Kaiqing Zhang, Ruihao Zhu, David Simchi-Levi, Tamer Başar
TL;DR
提出了RestartQ-UCB算法,它是第一个非定常强化学习的模型自由算法,并且通过实验证明在多代理强化学习和相关产品库存控制方面具有较好的性能。
Abstract
We consider model-free
reinforcement learning
(RL) in
non-stationary markov decision processes
(MDPs). Both the reward functions and the state transition distributions are allowed to vary over time, either gradua
→