BriefGPT.xyz
Dec, 2019
结合Q学习和搜索及摊销值估计
Combining Q-Learning and Search with Amortized Value Estimates
HTML
PDF
Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Tobias Pfaff, Theophane Weber...
TL;DR
SAVE是一种将模型无关的Q-学习与模型基于Monte-Carlo树搜索相结合的方法,它能够通过引导搜索来优化状态-动作值,从而在不增加计算成本的情况下提高学习性能,该方法已应用于物理推理任务和Atari游戏的智能体中展现出更好的性能。
Abstract
We introduce "Search with Amortized Value Estimates" (
save
), an approach for combining model-free
q-learning
with model-based
monte-carlo tree se
→