BriefGPT.xyz
Jun, 2017
模型不匹配下的强化学习
Reinforcement Learning under Model Mismatch
HTML
PDF
Aurko Roy, Huan Xu, Sebastian Pokutta
TL;DR
论文研究了缺失真实环境信息的强化学习问题,将鲁棒MDP框架扩展到无模型参数条件下的RL设置中,提出了三个具有鲁棒性的Q-learning、SARSA和TD-learning算法,并通过函数逼近扩展到大规模MDPs,证明了其收敛性,并给出了保证局部最小的随机梯度下降算法。
Abstract
We study
reinforcement learning
under
model misspecification
, where we do not have access to the true environment but only to a reasonably close approximation to it. We address this problem by extending the frame
→