BriefGPT.xyz
Feb, 2020
即使初值悲观 也能进行乐观探索
Optimistic Exploration even with a Pessimistic Initialisation
HTML
PDF
Tabish Rashid, Bei Peng, Wendelin Böhmer, Shimon Whiteson
TL;DR
提出在深度强化学习中使用基于计数的方法将 Q 值初始化为悲观值,并对其进行优化增强,实现对于探索和推广状态-动作对的乐观估计,并在硬探索任务中胜过了使用伪计数方法的非乐观深度 Q-Learning 变种。
Abstract
Optimistic initialisation is an effective strategy for efficient
exploration
in
reinforcement learning
(RL). In the tabular case, all provably efficient model-free algorithms rely on it. However, model-free deep
→