BriefGPT.xyz
Mar, 2019
基于交叉熵引导策略的连续动作Q学习
Q-Learning for Continuous Actions with Cross-Entropy Guided Policies
HTML
PDF
Riley Simmons-Edler, Ben Eisner, Eric Mitchell, Sebastian Seung, Daniel Lee
TL;DR
本文提出了一个名为Cross-Entropy Guided Policies (CGP)的新方法来将Q-learning与使用Cross-Entropy Method (CEM)的迭代采样策略相结合,以提高其在连续值动作域中的运行速度和稳定性。
Abstract
off-policy reinforcement learning
(RL) is an important class of methods for many problem domains, such as robotics, where the cost of collecting data is high and on-policy methods are consequently intractable. Standard methods for applying
→