多步贪心强化学习算法

Oct, 2019

Multi-step Greedy Policies in Model-Free Deep Reinforcement Learning

Manan Tomar, Yonathan Efroni, Mohammad Ghavamzadeh

TL;DR本篇论文探讨了基于多步贪婪策略在模型无关强化学习中的优势，并提出了基于$\kappa$-Policy Iteration和$\kappa$-Value Iteration的模型无关强化学习算法。通过实验表明这些算法对于某些任务的表现优于传统的强化学习算法如DQN和TRPO。

Abstract

Multi-step greedy policies have been extensively used in model-based reinforcement learning (RL) and in the case when a model of the environment is available (e.g., in the game of Go). In this work, we explore the benefits of multi-step greedy policies in →