BriefGPT.xyz
Jun, 2020
针对具有近似最优遗憾度的无限时间平均收益 MDP 的无模型学习算法
A Model-free Learning Algorithm for Infinite-horizon Average-reward MDPs with Near-optimal Regret
HTML
PDF
Mehdi Jafarnia-Jahromi, Chen-Yu Wei, Rahul Jain, Haipeng Luo
TL;DR
提出了一种基于EE-QL,结合浓度逼近和无模型弱交流 MDPs 的无模型学习算法,实现了与最佳已知基于模型算法相似的学习速度。
Abstract
Recently,
model-free reinforcement learning
has attracted research attention due to its simplicity, memory and computation efficiency, and the flexibility to combine with function approximation. In this paper, we propose
→