BriefGPT.xyz
Jul, 2020
使用线性函数逼近学习无限时间平均回报马尔可夫决策过程
Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation
HTML
PDF
Chen-Yu Wei, Mehdi Jafarnia-Jahromi, Haipeng Luo, Rahul Jain
TL;DR
开发多种学习用于Markov Decision Processes的无限时间平均奖励设置和线性函数逼近的算法,使用乐观原则和假设MDP具有线性结构,提出具有优化的计算效率的算法,并展开了详细的分析,改进了现有最佳结果。
Abstract
We develop several new algorithms for learning
markov decision processes
in an infinite-horizon average-reward setting with
linear function approximation
. Using the
→